News from the AI & ML world

DeeperML - #blackwell

Nvidia Blackwell GPUs Gain Traction in Cloud Deployments - Cloud providers like AWS and AI cloud platforms like CoreWeave are adopting Nvidia's Blackwell GPUs for enhanced AI performance, with Nvidia becoming the world's first $4 trillion company.

References: AWS News Blog , AIwire ,

Nvidia's latest Blackwell GPUs are rapidly gaining traction in cloud deployments, signaling a significant shift in AI hardware accessibility for businesses. Amazon Web Services (AWS) has announced its first UltraServer supercomputers, which are pre-configured systems powered by Nvidia's Grace CPUs and the new Blackwell GPUs. These U-P6e instances are available in full and half rack configurations and leverage advanced NVLink 5 ports to create large shared memory compute complexes. This allows for a memory domain spanning up to 72 GPU sockets, effectively creating a massive, unified computing environment designed for intensive AI workloads.

Adding to the growing adoption, CoreWeave, a prominent AI cloud provider, has become the first to offer NVIDIA RTX PRO 6000 Blackwell GPU instances at scale. This move promises substantial performance improvements for AI applications, with reports of up to 5.6x faster LLM inference compared to previous generations. CoreWeave's commitment to early deployment of Blackwell technology, including the NVIDIA GB300 NVL72 systems, is setting new benchmarks in rack-scale performance. By combining Nvidia's cutting-edge compute with their specialized AI cloud platform, CoreWeave aims to provide a more cost-efficient yet high-performing alternative for companies developing and scaling AI applications, supporting everything from training massive language models to multimodal inference.

The widespread adoption of Nvidia's Blackwell GPUs by major cloud providers like AWS and specialized AI platforms like CoreWeave underscores the increasing demand for advanced AI infrastructure. This trend is further highlighted by Nvidia's recent milestone of becoming the world's first $4 trillion company, a testament to its leading role in the AI revolution. Moreover, countries like Indonesia are actively pursuing sovereign AI goals, partnering with companies like Nvidia, Cisco, and Indosat Ooredoo Hutchison to establish AI Centers of Excellence. These initiatives aim to foster localized AI research, develop local talent, and drive innovation, ensuring that nations can harness the power of AI for economic growth and digital independence.

Recommended read:

Top link: www.nextplatform.com
Permalink: More details

References :

AWS News Blog: Amazon announces the general availability of EC2 P6e-GB200 UltraServers, powered by NVIDIA Grace Blackwell GB200 superchips that enable up to 72 GPUs with 360 petaflops of computing power for AI training and inference at the trillion-parameter scale.
AIwire: CoreWeave, Inc. today announced it is the first cloud platform to make NVIDIA RTX PRO 6000 Blackwell Server Edition instances generally available.
The Next Platform: Sizing Up AWS â€œBlackwellâ€ GPU Systems Against Prior GPUs And Trainiums

@www.linkedin.com //

Nvidia Blackwell GPUs Dominate MLPerf Training - Nvidia's MLPerf Training 5.0 results showcase the Blackwell GB200 accelerators, achieving record time-to-train scores and demonstrating significant improvements in time-to-convergence across various workloads.

References: NVIDIA Newsroom , MLCommons , www.aiwire.net ...

Nvidia has once again asserted its dominance in the AI training landscape with the release of the MLPerf Training v5.0 results. The company's Blackwell GB200 accelerators achieved record time-to-train scores, showcasing a significant leap in performance. This latest benchmark suite included submissions from various companies, but Nvidia's platform stood out, particularly in the most demanding large language model (LLM)-focused test involving Llama 3.1 405B pretraining. These results underscore the rapid growth and evolution of the AI field, with the Blackwell architecture demonstrably meeting the heightened performance demands of next-generation AI applications.

The MLPerf Training v5.0 results highlight Nvidia's commitment to versatility, as it was the only platform to submit results across every benchmark. The at-scale submissions leveraged two AI supercomputers powered by the Blackwell platform: Tyche, built using GB200 NVL72 rack-scale systems, and Nyx, based on DGX B200 systems. Additionally, Nvidia collaborated with CoreWeave and IBM, utilizing a cluster of 2,496 Blackwell GPUs and 1,248 Grace CPUs. The new Llama 3.1 405B pretraining benchmark witnessed Blackwell delivering 2.2x greater performance compared to the previous generation architecture at the same scale.

The performance gains are attributed to advancements in the Blackwell architecture, encompassing high-density liquid-cooled racks, 13.4TB of coherent memory per rack, and fifth-generation NVLink and NVLink Switch interconnect technologies for scale-up, as well as Quantum-2 InfiniBand networking for scale-out. These technological innovations, combined with the NVIDIA NeMo Framework software stack, are raising the bar for next-generation multimodal LLM training. While AMD did showcase generational performance gains, Nvidia's GPUs reigned supreme, outpacing AMD's MI325X in MLPerf benchmarks, solidifying Nvidia's position as a leader in AI training capabilities.

Recommended read:

Top link: www.linkedin.com
Permalink: More details

References :

NVIDIA Newsroom: NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results
MLCommons: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
IEEE Spectrum: Nvidiaâ€™s Blackwell Conquers Largest LLM Training Benchmark
www.aiwire.net: Blackwell GPUs Lift Nvidia to the Top of MLPerf Training Rankings
www.servethehome.com: MLPerf Training v5.0 is Out
IEEE Spectrum: Ideal networking aids performance of the largest submissions to the LLM fine-tuning benchmarks, the system with the largest number of GPUs was submitted by Nvidia, a computer connecting 512 B200s.
ServeTheHome: The new MLPerf Training v5.0 are dominated by NVIDIA Blackwell and Hopper results, but we also get AMD Instinct MI325X on a benchmark as well

@www.linkedin.com //

Nvidia Blackwell GPUs Dominate MLPerf Training Benchmarks - Nvidia's Blackwell GPUs dominate MLPerf Training 5.0, showcasing record time-to-train scores and scaling efficiency, improving time-to-convergence and performance compared to previous generations.

References: NVIDIA Newsroom , NVIDIA Technical Blog , NVIDIA Technical Blog ...

Nvidia's Blackwell GPUs have achieved top rankings in the latest MLPerf Training v5.0 benchmarks, demonstrating breakthrough performance across various AI workloads. The NVIDIA AI platform delivered the highest performance at scale on every benchmark, including the most challenging large language model (LLM) test, Llama 3.1 405B pretraining. Nvidia was the only vendor to submit results on all MLPerf Training v5.0 benchmarks, highlighting the versatility of the NVIDIA platform across a wide array of AI workloads, including LLMs, recommendation systems, multimodal LLMs, object detection, and graph neural networks.

The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs. The GB200 NVL72 systems achieved 90% scaling efficiency up to 2,496 GPUs, improving time-to-convergence by up to 2.6x compared to Hopper-generation H100.

The new MLPerf Training v5.0 benchmark suite introduces a pretraining benchmark based on the Llama 3.1 405B generative AI system, the largest model to be introduced in the training benchmark suite. On this benchmark, Blackwell delivered 2.2x greater performance compared with the previous-generation architecture at the same scale. Furthermore, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5x more performance compared with a submission using the same number of GPUs in the prior round. These performance gains highlight advancements in the Blackwell architecture and software stack, including high-density liquid-cooled racks, fifth-generation NVLink and NVLink Switch interconnect technologies, and NVIDIA Quantum-2 InfiniBand networking.

Recommended read:

Top link: www.linkedin.com
Permalink: More details

References :

NVIDIA Newsroom: NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results
NVIDIA Technical Blog: NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0
IEEE Spectrum: Nvidia’s Blackwell Conquers Largest LLM Training Benchmark
NVIDIA Technical Blog: Reproducing NVIDIA MLPerf v5.0 Training Scores for LLM Benchmarks
AI News | VentureBeat: Nvidia says its Blackwell chips lead benchmarks in training AI LLMs
blogs.nvidia.com: NVIDIA RTX Blackwell GPUs Accelerate Professional-Grade Video Editing
MLCommons: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
www.aiwire.net: MLPerf Training v5.0 results show Nvidia’s Blackwell GB200 accelerators sprinting through record time-to-train scores.
blogs.nvidia.com: NVIDIA is working with companies worldwide to build out AI factories â€” speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training â€” the
mlcommons.org: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
NVIDIA Newsroom: NVIDIA RTX Blackwell GPUs Accelerate Professional-Grade Video Editing
ServeTheHome: The new MLPerf Training v5.0 are dominated by NVIDIA Blackwell and Hopper results, but we also get AMD Instinct MI325X on a benchmark as well
AIwire: This is a news article on nvidia Blackwell GPUs lift Nvidia to the top of MLPerf Training Rankings
IEEE Spectrum: Nvidiaâ€™s Blackwell Conquers Largest LLM Training Benchmark
www.servethehome.com: MLPerf Training v5.0 is Out

Heng Chi@AI Accelerator Institute //

AI Enhances Data Pipelines Visualization Real Time Analytics - AI is being integrated into data management, pipeline construction, and analytics; AWS offers services for building efficient data pipelines, Microsoft’s Data Formulator simplifies data visualization, Yandex releases Yambda, and Dremio works with Confluent’s TableFlow.

References: insideAI News , futurumgroup.com , techhq.com ...

AI is revolutionizing data management and analytics across various platforms. Amazon Web Services (AWS) is facilitating the development of high-performance data pipelines for AI and Natural Language Processing (NLP) applications, utilizing services like Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. These pipelines are essential for ingesting, processing, and providing output for training, inference, and decision-making at a large scale, leveraging AWS's scalability, flexibility, and cost-efficiency. AWS's auto-scaling options, seamless integration with ML and NLP workflows, and pay-as-you-go pricing model make it a preferred choice for businesses of all sizes.

Microsoft is simplifying data visualization with its new AI-powered tool, Data Formulator. This open-source application, developed by Microsoft Research, uses Large Language Models (LLMs) to transform data into interesting charts and graphs, even for users without extensive data manipulation and visualization knowledge. Data Formulator differentiates itself with its intuitive user interface and hybrid interactions, bridging the gap between visualization ideas and their actual creation. By supplementing natural language inputs with drag-and-drop interactions, it allows users to express visualization intent, with the AI handling the complex transformations in the background.

Yandex has released Yambda, the world's largest publicly available event dataset, to accelerate recommender systems research and development. This dataset contains nearly 5 billion anonymized user interaction events from Yandex Music, offering a valuable resource for bridging the gap between academic research and industry-scale applications. Yambda addresses the scarcity of large, openly accessible datasets in the field of recommender systems, which has traditionally lagged behind other AI domains due to the sensitive nature and commercial value of behavioral data. Additionally, Dremio is collaborating with Confluent’s TableFlow to provide real-time analytics on Apache Iceberg data, enabling users to stream data from Kafka into queryable tables without manual pipelines, accelerating insights and reducing ETL complexity.

Recommended read:

Top link: AI Accelerator Institute
Permalink: More details

References :

insideAI News: NVIDIA and AMD Devising Export Rules-Compliant Chips for China AI Market
futurumgroup.com: Can Dell and NVIDIAâ€™s AI Factory 2.0 Solve Enterprise-Scale AI Infrastructure Gaps?
TechHQ: Dell to build Nvidia Vera Rubin supercomputer for US Energy Department
techhq.com: Dell to build Nvidia Vera Rubin supercomputer for US Energy Department
futurumgroup.com: Can Dell Challenge Public Cloud AI with Its Expanded AI Factory?
insidehpc.com: DOE Announces â€œDoudnaâ€� Dell-NVIDIA Supercomputer at NERSC
techxplore.com: US supercomputer named after Nobel laureate Jennifer Doudna to power AI and scientific research
AI Accelerator Institute: Building efficient data pipelines for AI and NLP applications in AWS
www.dremio.com: Using Dremio with Confluentâ€™s TableFlow for Real-Time Apache Iceberg Analytics
www.marktechpost.com: Yandex Releases Yambda: The Worldâ€™s Largest Event Dataset to Accelerate Recommender Systems

Aaron Klotz@tomshardware.com //

NVIDIA Achieves AI Breakthrough with Blackwell, Targets Workstation Market - NVIDIA unveiled the DGX B200 Blackwell node, surpassing 1000 TPS/user with Meta’s Llama 4 Maverick, collaborates with Navitas for 800V HVDC architecture to boost AI data center power, and targets the workstation AI market with Radeon AI Pro R9700.

References: insideAI News , techvro.com , ServeTheHome ...

NVIDIA has recently announced a significant breakthrough in AI inference, achieving a new world record with its DGX B200 Blackwell node. This node, equipped with eight Blackwell GPUs, surpassed the 1,000 tokens per second (TPS) per user barrier while running Meta’s Llama 4 Maverick large language model. According to a report by Artificial Analysis, the DGX B200 node achieved 1,038 TPS/user, outperforming previous record holders like SambaNova, who achieved 792 TPS/user. This advancement showcases the immense capabilities of the Blackwell architecture and sets a new standard for AI performance.

NVIDIA achieved this record-breaking performance through extensive software optimizations, utilizing TensorRT and Eagle-3 techniques for speculative decoding. These optimizations resulted in a 4x performance uplift compared to Blackwell's prior best results. Further enhancements involved using FP8 data types, Attention operations, and the Mixture of Experts (MoE) AI technique. These improvements not only boosted speed but also maintained response accuracy. NVIDIA's Blackwell GPUs reached 72,000 TPS/server at their highest throughput configuration.

In addition to AI performance, NVIDIA is also revolutionizing AI data center infrastructure through a collaboration with Navitas Semiconductor. They are introducing a new 800V HVDC architecture designed to replace the aging 54V systems currently in use. This new architecture is expected to deliver up to 5% better power efficiency and 70% lower maintenance costs. The transition to 800V power enables a 45% reduction in copper wire thickness, significantly lowering material use and weight, while reducing heat and fewer losses.

Recommended read:

Top link: tomshardware.com
Permalink: More details

References :

insideAI News: AI Inference: NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Llama 4 Maverick
techvro.com: NVIDIA collaborates with Navitas to usher in a new era for AI data center infrastructure. Their joint 800V HVDC architecture replaces the...
www.tomshardware.com: Nvidia has broken another AI world record, breaking over 1,000 TPS/user with a DGX B200 node boasting eight Blackwell GPUs inside.
ServeTheHome: Nvidia has broken another AI world record, breaking over 1,000 TPS/user with a DGX B200 node boasting eight Blackwell GPUs inside.

@blogs.nvidia.com //

Nvidia Expands AI Infrastructure with Open Ecosystem - Nvidia is expanding its AI infrastructure influence with NVLink Fusion, integrating non-Nvidia components, establishing AI factories, and focusing on humanoid robotics.

References: techvro.com , The Register - Software , AIwire ...

Nvidia is significantly expanding its AI infrastructure initiatives by introducing NVLink Fusion, a technology that allows for the integration of non-Nvidia CPUs and AI accelerators with Nvidia's GPUs. This strategic move aims to provide customers with more flexible and customizable AI system configurations, broadening Nvidia's reach in the rapidly growing data center market. Key partnerships are already in place with companies like Qualcomm, Fujitsu, Marvell, and MediaTek, as well as design software firms Cadence and Synopsys, to foster a robust and open ecosystem. This approach allows Nvidia to remain central to the future of AI infrastructure, even when systems incorporate chips from other vendors.

Nvidia is also solidifying its presence in Taiwan, establishing a new office complex near Taipei that will serve as its overseas headquarters. The company is collaborating with Foxconn to build an "AI factory" in Taiwan, which will utilize 10,000 Nvidia Blackwell GPUs. This facility is intended to bolster Taiwan's AI infrastructure and support local organizations in adopting AI technologies across various sectors. TSMC, Nvidia's primary chip supplier, plans to leverage this supercomputer for research and development, aiming to develop the next generation of AI chips.

Furthermore, Nvidia is working with Taiwan's National Center for High-Performance Computing (NCHC) to develop a new AI supercomputer. This system will feature over 1,700 GPUs, GB200 NVL72 rack-scale systems, and an HGX B300 system based on the Blackwell Ultra platform, all connected via Quantum InfiniBand networking. Expected to launch later this year, the supercomputer promises an eightfold performance increase over its predecessor for AI workloads, providing researchers with enhanced capabilities to advance their projects. Academic institutions, government agencies, and small businesses in Taiwan will be able to apply for access to the supercomputer to accelerate their AI initiatives.

Recommended read:

Top link: blogs.nvidia.com
Permalink: More details

References :

techvro.com: NVLink Fusion: Nvidia To Sell Hybrid Systems Using AI Chips
The Register - Software: Nvidia sets up shop in Taiwan with AI supers and a factory full of ambition
NVIDIA Newsroom: NVIDIA CEO Envisions AI Infrastructure Industry Worth â€˜Trillions of Dollarsâ€™
AIwire: Nvidia’s Global Expansion: AI Factories, NVLink Fusion, AI Supercomputers, and More
insideAI News: NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000Â tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model.
Chips and Cheese: Hello you fine Internet folks, Today we are covering the announcements that Nvidia had at Computex 2025 which were theâ€¦

@blogs.nvidia.com //

NVIDIA Blackwell Accelerates AI Engineering and Simulations - NVIDIA's Blackwell systems in Cadence's Millennium M2000 Supercomputer accelerate AI-driven engineering design and scientific simulation; ServiceNow and NVIDIA collaborate on a reasoning model for enterprise AI; OpenSearch 3.0 gains GPU acceleration.

References: NVIDIA Newsroom , www.networkworld.com , Ken Yeung ...

Cadence has unveiled the Millennium M2000 Supercomputer, a powerhouse featuring NVIDIA Blackwell systems, aimed at revolutionizing AI-driven engineering design and scientific simulations. This supercomputer integrates NVIDIA HGX B200 systems and NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, coupled with NVIDIA CUDA-X software libraries and Cadence's optimized software. The result is a system capable of delivering up to 80 times higher performance compared to its CPU-based predecessors, marking a significant leap forward in computational capability for electronic design automation, system design, and life sciences workloads.

This collaboration between Cadence and NVIDIA is set to enable engineers to conduct massive simulations, leading to breakthroughs in various fields, including the design and development of autonomous machines, drug molecules, semiconductors, and data centers. NVIDIA's founder and CEO, Jensen Huang, highlighted the transformative potential of AI, stating that it will infuse every aspect of business and product development. Huang also announced NVIDIA's plans to acquire ten Millennium Supercomputer systems based on the NVIDIA GB200 NVL72 platform to accelerate the company’s chip design workflows, emphasizing the importance of this technology for NVIDIA's future endeavors.

In related news, the open-source OpenSearch software has launched its 3.0 version, which includes GPU acceleration to enhance AI workloads through its new OpenSearch Vector Engine. This update leverages NVIDIA GPUs to improve search performance with large-scale vector workloads and reduce index build times, aiming to address scalability issues common in vector databases. OpenSearch 3.0 also supports Anthropic PBC’s Model Context Protocol, facilitating the integration of large language models with external data. The Millennium M2000 Supercomputer harnesses accelerated software from NVIDIA and Cadence for applications including circuit simulation, computational fluid dynamics, data center design and molecular design.

Recommended read:

Top link: blogs.nvidia.com
Permalink: More details

References :

NVIDIA Newsroom: Cadence Taps NVIDIA Blackwell to Accelerate AI-Driven Engineering Design and Scientific Simulation
www.networkworld.com: Cadence debuts Nvidia-powered supercomputer to accelerate enterprise engineering, biotech
insidehpc.com: Cadence Unveils Millennium M2000 Supercomputer with NVIDIA Blackwell Systems
Ken Yeung: ServiceNow and Nvidia Debut Apriel Nemotron 15B, an Open-Source Reasoning Model Built for Faster, Cheaper Agentic AI

News from the AI & ML world

DeeperML - #blackwell

Nvidia Blackwell GPUs Gain Traction in Cloud Deployments - Cloud providers like AWS and AI cloud platforms like CoreWeave are adopting Nvidia's Blackwell GPUs for enhanced AI performance, with Nvidia becoming the world's first $4 trillion company.

Nvidia Blackwell GPUs Dominate MLPerf Training - Nvidia's MLPerf Training 5.0 results showcase the Blackwell GB200 accelerators, achieving record time-to-train scores and demonstrating significant improvements in time-to-convergence across various workloads.

Nvidia Blackwell GPUs Dominate MLPerf Training Benchmarks - Nvidia's Blackwell GPUs dominate MLPerf Training 5.0, showcasing record time-to-train scores and scaling efficiency, improving time-to-convergence and performance compared to previous generations.

Nvidia Expands AI Infrastructure with Open Ecosystem - Nvidia is expanding its AI infrastructure influence with NVLink Fusion, integrating non-Nvidia components, establishing AI factories, and focusing on humanoid robotics.

Benchmarks

Blogs

Research Tools