Called the AI Research SuperCluster (RSC), Meta claims that it is “among the fastest AI supercomputers running today” and will be the fastest globally when it is fully built out by mid-2022. The company began designing the new computing infrastructure in early 2020 in order to accelerate the process of training large AI models, with the goal of one day training models with over a trillion parameters of data — equivalent to 36,000 years of high-quality video.
“Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform — the metaverse, where AI-driven applications and products will play an important role,” Facebook’s parent company wrote in a blog post.
Meta says it has been committed to long-term investment in AI since 2013 when it created the Facebook AI Research lab. The first generation of high-performance computing infrastructure from Meta’s AI research team was built in 2017 and has 22,000 NVIDIA V100 Tensor Core GPUs in a single cluster.
“RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs — with each A100 GPU being more powerful than the V100 used in our previous system,” the company says. While the supercomputer is currently operational, the company says it plans to increase the number of GPUs to 16,000.
“All this infrastructure must be extremely reliable, as we estimate some experiments could run for weeks and require thousands of GPUs. Lastly, the entire experience of using RSC has to be researcher-friendly so our teams can easily explore a wide range of AI models,” Meta adds.