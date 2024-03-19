Nvidia, a pioneer in artificial intelligence technologies, recently confirmed its position as a market leader by introducing game-changing innovations that could further extend its competitive advantage. With the H100 AI chip, Nvidia has become a multi-trillion dollar company, surpassing giants like Alphabet and Amazon. However, its latest move in the industry may leave competitors even further behind: the introduction of the new Blackwell B200 GPU and the GB200 “superchip”.

During the livestream of the GPU Technology Conference event, Nvidia CEO Jensen Huang presented the new B200 GPU, alongside the now famous H100, revealing a computing power of up to 20 petaflops of FP4 thanks to its 208 billion transistor. Even more impressive is the GB200, which integrates two B200 GPUs with a single Grace processor, delivering up to 30x greater performance for large language model (LLM) inference, with significant reductions in cost and power consumption up to 95% compared to H100.

Training a model with 1.8 trillion parameters, which previously required 8,000 Hopper GPUs and 15 megawatts of power, can now be accomplished with just 2,000 Blackwell GPUs, consuming just four megawatts. When compared on an LLM GPT-3 benchmark with 175 billion parameters, the GB200 demonstrated approximately seven times the performance of the H100, quadrupling the training speed. One of Nvidia's major improvements is its second-generation Transformer Engine, which doubles computing power, bandwidth, and model size, using just four bits per neuron instead of eight. Another significant innovation emerges when connecting large numbers of these GPUs: a next-generation NVLink switch that allows 576 GPUs to communicate with each other, with a bidirectional bandwidth of 1.8 terabytes per second.

The Blackwell architecture is also notable for the addition of FP4 and FP6 formats, and the ability to scale up to tens of thousands of GB200 superchips, connected via 800Gbps networks with the new Quantum-X800 InfiniBand or Spectrum-X800 ethernet technology. Nvidia doesn't stop there and also presents the DGX Superpod for DGX GB200, which combines eight systems in one for a total of 288 CPUs, 576 GPUs, 240TB of memory and 11.5 exaflops of FP4 computing power, promising a revolution in the field of AI training at scale. Cloud giants such as Amazon, Google, Microsoft and Oracle are already planning to offer NVL72 racks in their cloud service offerings, marking the beginning of a new era in computing power available for artificial intelligence.