@www.linkedin.com
//
Nvidia has once again asserted its dominance in the AI training landscape with the release of the MLPerf Training v5.0 results. The company's Blackwell GB200 accelerators achieved record time-to-train scores, showcasing a significant leap in performance. This latest benchmark suite included submissions from various companies, but Nvidia's platform stood out, particularly in the most demanding large language model (LLM)-focused test involving Llama 3.1 405B pretraining. These results underscore the rapid growth and evolution of the AI field, with the Blackwell architecture demonstrably meeting the heightened performance demands of next-generation AI applications.
The MLPerf Training v5.0 results highlight Nvidia's commitment to versatility, as it was the only platform to submit results across every benchmark. The at-scale submissions leveraged two AI supercomputers powered by the Blackwell platform: Tyche, built using GB200 NVL72 rack-scale systems, and Nyx, based on DGX B200 systems. Additionally, Nvidia collaborated with CoreWeave and IBM, utilizing a cluster of 2,496 Blackwell GPUs and 1,248 Grace CPUs. The new Llama 3.1 405B pretraining benchmark witnessed Blackwell delivering 2.2x greater performance compared to the previous generation architecture at the same scale. The performance gains are attributed to advancements in the Blackwell architecture, encompassing high-density liquid-cooled racks, 13.4TB of coherent memory per rack, and fifth-generation NVLink and NVLink Switch interconnect technologies for scale-up, as well as Quantum-2 InfiniBand networking for scale-out. These technological innovations, combined with the NVIDIA NeMo Framework software stack, are raising the bar for next-generation multimodal LLM training. While AMD did showcase generational performance gains, Nvidia's GPUs reigned supreme, outpacing AMD's MI325X in MLPerf benchmarks, solidifying Nvidia's position as a leader in AI training capabilities. References :
Classification:
@www.linkedin.com
//
Nvidia's Blackwell GPUs have achieved top rankings in the latest MLPerf Training v5.0 benchmarks, demonstrating breakthrough performance across various AI workloads. The NVIDIA AI platform delivered the highest performance at scale on every benchmark, including the most challenging large language model (LLM) test, Llama 3.1 405B pretraining. Nvidia was the only vendor to submit results on all MLPerf Training v5.0 benchmarks, highlighting the versatility of the NVIDIA platform across a wide array of AI workloads, including LLMs, recommendation systems, multimodal LLMs, object detection, and graph neural networks.
The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs. The GB200 NVL72 systems achieved 90% scaling efficiency up to 2,496 GPUs, improving time-to-convergence by up to 2.6x compared to Hopper-generation H100. The new MLPerf Training v5.0 benchmark suite introduces a pretraining benchmark based on the Llama 3.1 405B generative AI system, the largest model to be introduced in the training benchmark suite. On this benchmark, Blackwell delivered 2.2x greater performance compared with the previous-generation architecture at the same scale. Furthermore, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5x more performance compared with a submission using the same number of GPUs in the prior round. These performance gains highlight advancements in the Blackwell architecture and software stack, including high-density liquid-cooled racks, fifth-generation NVLink and NVLink Switch interconnect technologies, and NVIDIA Quantum-2 InfiniBand networking. References :
Classification:
|
BenchmarksBlogsResearch Tools |