Google’s AI Supercomputer Outperforms Nvidia A100 Chip.
Google has released new details about the supercomputers used to train its AI models. The company claims that its systems are faster and more power-efficient than comparable systems from Nvidia.
Google has developed its custom chip, the Tensor Processing Unit (TPU), which it uses for over 90% of its work on AI training. The company has now released details about the fourth-generation TPU.
In a scientific paper, Google described how it connected more than 4,000 TPUs together into a supercomputer using custom-developed optical switches. The connections between chips have become a key point of competition among AI supercomputer companies.
Rapidly Growing AI Models:
Large language models that power technologies like Google’s Bard or OpenAI’s ChatGPT have exploded in size. The models must be split across thousands of chips, which then work together for weeks or more to train the model.
Google’s PaLM model, its largest publicly disclosed language model to date, was trained by splitting it across two of the 4,000-chip supercomputers over 50 days.
Google said that its supercomputers make it easy to reconfigure connections between chips on the fly, helping avoid problems and tweak for performance gains.
Google’s AI Advancements:
Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson wrote in a blog post about the system. “This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model.”
Google’s Ai supercomputer has been online inside the company since 2020 in a data center in Mayes County, Oklahoma. The startup Midjourney used the system to train its model, which generates fresh images after being fed a few words of text.
In the paper, Google said that for comparably sized systems, its chips are up to 1.7 times faster and 1.9 times more power-efficient than a system based on Nvidia’s A100 chip.
Google did not compare its fourth generation to Nvidia’s current flagship H100 chip because the H100 came to the market after Google’s chip and is made with newer technology.
Google hinted that it might be working on a new TPU that would compete with the Nvidia H100 but provided no details. Jouppi told Reuters that Google has “a healthy pipeline of future chips.”