@www.nextplatform.com
//
References:
AWS News Blog
, AIwire
,
Nvidia's latest Blackwell GPUs are rapidly gaining traction in cloud deployments, signaling a significant shift in AI hardware accessibility for businesses. Amazon Web Services (AWS) has announced its first UltraServer supercomputers, which are pre-configured systems powered by Nvidia's Grace CPUs and the new Blackwell GPUs. These U-P6e instances are available in full and half rack configurations and leverage advanced NVLink 5 ports to create large shared memory compute complexes. This allows for a memory domain spanning up to 72 GPU sockets, effectively creating a massive, unified computing environment designed for intensive AI workloads.
Adding to the growing adoption, CoreWeave, a prominent AI cloud provider, has become the first to offer NVIDIA RTX PRO 6000 Blackwell GPU instances at scale. This move promises substantial performance improvements for AI applications, with reports of up to 5.6x faster LLM inference compared to previous generations. CoreWeave's commitment to early deployment of Blackwell technology, including the NVIDIA GB300 NVL72 systems, is setting new benchmarks in rack-scale performance. By combining Nvidia's cutting-edge compute with their specialized AI cloud platform, CoreWeave aims to provide a more cost-efficient yet high-performing alternative for companies developing and scaling AI applications, supporting everything from training massive language models to multimodal inference. The widespread adoption of Nvidia's Blackwell GPUs by major cloud providers like AWS and specialized AI platforms like CoreWeave underscores the increasing demand for advanced AI infrastructure. This trend is further highlighted by Nvidia's recent milestone of becoming the world's first $4 trillion company, a testament to its leading role in the AI revolution. Moreover, countries like Indonesia are actively pursuing sovereign AI goals, partnering with companies like Nvidia, Cisco, and Indosat Ooredoo Hutchison to establish AI Centers of Excellence. These initiatives aim to foster localized AI research, develop local talent, and drive innovation, ensuring that nations can harness the power of AI for economic growth and digital independence. Recommended read:
References :
@www.linkedin.com
//
Nvidia has once again asserted its dominance in the AI training landscape with the release of the MLPerf Training v5.0 results. The company's Blackwell GB200 accelerators achieved record time-to-train scores, showcasing a significant leap in performance. This latest benchmark suite included submissions from various companies, but Nvidia's platform stood out, particularly in the most demanding large language model (LLM)-focused test involving Llama 3.1 405B pretraining. These results underscore the rapid growth and evolution of the AI field, with the Blackwell architecture demonstrably meeting the heightened performance demands of next-generation AI applications.
The MLPerf Training v5.0 results highlight Nvidia's commitment to versatility, as it was the only platform to submit results across every benchmark. The at-scale submissions leveraged two AI supercomputers powered by the Blackwell platform: Tyche, built using GB200 NVL72 rack-scale systems, and Nyx, based on DGX B200 systems. Additionally, Nvidia collaborated with CoreWeave and IBM, utilizing a cluster of 2,496 Blackwell GPUs and 1,248 Grace CPUs. The new Llama 3.1 405B pretraining benchmark witnessed Blackwell delivering 2.2x greater performance compared to the previous generation architecture at the same scale. The performance gains are attributed to advancements in the Blackwell architecture, encompassing high-density liquid-cooled racks, 13.4TB of coherent memory per rack, and fifth-generation NVLink and NVLink Switch interconnect technologies for scale-up, as well as Quantum-2 InfiniBand networking for scale-out. These technological innovations, combined with the NVIDIA NeMo Framework software stack, are raising the bar for next-generation multimodal LLM training. While AMD did showcase generational performance gains, Nvidia's GPUs reigned supreme, outpacing AMD's MI325X in MLPerf benchmarks, solidifying Nvidia's position as a leader in AI training capabilities. Recommended read:
References :
@www.linkedin.com
//
Nvidia's Blackwell GPUs have achieved top rankings in the latest MLPerf Training v5.0 benchmarks, demonstrating breakthrough performance across various AI workloads. The NVIDIA AI platform delivered the highest performance at scale on every benchmark, including the most challenging large language model (LLM) test, Llama 3.1 405B pretraining. Nvidia was the only vendor to submit results on all MLPerf Training v5.0 benchmarks, highlighting the versatility of the NVIDIA platform across a wide array of AI workloads, including LLMs, recommendation systems, multimodal LLMs, object detection, and graph neural networks.
The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs. The GB200 NVL72 systems achieved 90% scaling efficiency up to 2,496 GPUs, improving time-to-convergence by up to 2.6x compared to Hopper-generation H100. The new MLPerf Training v5.0 benchmark suite introduces a pretraining benchmark based on the Llama 3.1 405B generative AI system, the largest model to be introduced in the training benchmark suite. On this benchmark, Blackwell delivered 2.2x greater performance compared with the previous-generation architecture at the same scale. Furthermore, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5x more performance compared with a submission using the same number of GPUs in the prior round. These performance gains highlight advancements in the Blackwell architecture and software stack, including high-density liquid-cooled racks, fifth-generation NVLink and NVLink Switch interconnect technologies, and NVIDIA Quantum-2 InfiniBand networking. Recommended read:
References :
Heng Chi@AI Accelerator Institute
//
AI is revolutionizing data management and analytics across various platforms. Amazon Web Services (AWS) is facilitating the development of high-performance data pipelines for AI and Natural Language Processing (NLP) applications, utilizing services like Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker. These pipelines are essential for ingesting, processing, and providing output for training, inference, and decision-making at a large scale, leveraging AWS's scalability, flexibility, and cost-efficiency. AWS's auto-scaling options, seamless integration with ML and NLP workflows, and pay-as-you-go pricing model make it a preferred choice for businesses of all sizes.
Microsoft is simplifying data visualization with its new AI-powered tool, Data Formulator. This open-source application, developed by Microsoft Research, uses Large Language Models (LLMs) to transform data into interesting charts and graphs, even for users without extensive data manipulation and visualization knowledge. Data Formulator differentiates itself with its intuitive user interface and hybrid interactions, bridging the gap between visualization ideas and their actual creation. By supplementing natural language inputs with drag-and-drop interactions, it allows users to express visualization intent, with the AI handling the complex transformations in the background. Yandex has released Yambda, the world's largest publicly available event dataset, to accelerate recommender systems research and development. This dataset contains nearly 5 billion anonymized user interaction events from Yandex Music, offering a valuable resource for bridging the gap between academic research and industry-scale applications. Yambda addresses the scarcity of large, openly accessible datasets in the field of recommender systems, which has traditionally lagged behind other AI domains due to the sensitive nature and commercial value of behavioral data. Additionally, Dremio is collaborating with Confluent’s TableFlow to provide real-time analytics on Apache Iceberg data, enabling users to stream data from Kafka into queryable tables without manual pipelines, accelerating insights and reducing ETL complexity. Recommended read:
References :
Aaron Klotz@tomshardware.com
//
NVIDIA has recently announced a significant breakthrough in AI inference, achieving a new world record with its DGX B200 Blackwell node. This node, equipped with eight Blackwell GPUs, surpassed the 1,000 tokens per second (TPS) per user barrier while running Meta’s Llama 4 Maverick large language model. According to a report by Artificial Analysis, the DGX B200 node achieved 1,038 TPS/user, outperforming previous record holders like SambaNova, who achieved 792 TPS/user. This advancement showcases the immense capabilities of the Blackwell architecture and sets a new standard for AI performance.
NVIDIA achieved this record-breaking performance through extensive software optimizations, utilizing TensorRT and Eagle-3 techniques for speculative decoding. These optimizations resulted in a 4x performance uplift compared to Blackwell's prior best results. Further enhancements involved using FP8 data types, Attention operations, and the Mixture of Experts (MoE) AI technique. These improvements not only boosted speed but also maintained response accuracy. NVIDIA's Blackwell GPUs reached 72,000 TPS/server at their highest throughput configuration. In addition to AI performance, NVIDIA is also revolutionizing AI data center infrastructure through a collaboration with Navitas Semiconductor. They are introducing a new 800V HVDC architecture designed to replace the aging 54V systems currently in use. This new architecture is expected to deliver up to 5% better power efficiency and 70% lower maintenance costs. The transition to 800V power enables a 45% reduction in copper wire thickness, significantly lowering material use and weight, while reducing heat and fewer losses. Recommended read:
References :
@blogs.nvidia.com
//
Nvidia is significantly expanding its AI infrastructure initiatives by introducing NVLink Fusion, a technology that allows for the integration of non-Nvidia CPUs and AI accelerators with Nvidia's GPUs. This strategic move aims to provide customers with more flexible and customizable AI system configurations, broadening Nvidia's reach in the rapidly growing data center market. Key partnerships are already in place with companies like Qualcomm, Fujitsu, Marvell, and MediaTek, as well as design software firms Cadence and Synopsys, to foster a robust and open ecosystem. This approach allows Nvidia to remain central to the future of AI infrastructure, even when systems incorporate chips from other vendors.
Nvidia is also solidifying its presence in Taiwan, establishing a new office complex near Taipei that will serve as its overseas headquarters. The company is collaborating with Foxconn to build an "AI factory" in Taiwan, which will utilize 10,000 Nvidia Blackwell GPUs. This facility is intended to bolster Taiwan's AI infrastructure and support local organizations in adopting AI technologies across various sectors. TSMC, Nvidia's primary chip supplier, plans to leverage this supercomputer for research and development, aiming to develop the next generation of AI chips. Furthermore, Nvidia is working with Taiwan's National Center for High-Performance Computing (NCHC) to develop a new AI supercomputer. This system will feature over 1,700 GPUs, GB200 NVL72 rack-scale systems, and an HGX B300 system based on the Blackwell Ultra platform, all connected via Quantum InfiniBand networking. Expected to launch later this year, the supercomputer promises an eightfold performance increase over its predecessor for AI workloads, providing researchers with enhanced capabilities to advance their projects. Academic institutions, government agencies, and small businesses in Taiwan will be able to apply for access to the supercomputer to accelerate their AI initiatives. Recommended read:
References :
@blogs.nvidia.com
//
Cadence has unveiled the Millennium M2000 Supercomputer, a powerhouse featuring NVIDIA Blackwell systems, aimed at revolutionizing AI-driven engineering design and scientific simulations. This supercomputer integrates NVIDIA HGX B200 systems and NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, coupled with NVIDIA CUDA-X software libraries and Cadence's optimized software. The result is a system capable of delivering up to 80 times higher performance compared to its CPU-based predecessors, marking a significant leap forward in computational capability for electronic design automation, system design, and life sciences workloads.
This collaboration between Cadence and NVIDIA is set to enable engineers to conduct massive simulations, leading to breakthroughs in various fields, including the design and development of autonomous machines, drug molecules, semiconductors, and data centers. NVIDIA's founder and CEO, Jensen Huang, highlighted the transformative potential of AI, stating that it will infuse every aspect of business and product development. Huang also announced NVIDIA's plans to acquire ten Millennium Supercomputer systems based on the NVIDIA GB200 NVL72 platform to accelerate the company’s chip design workflows, emphasizing the importance of this technology for NVIDIA's future endeavors. In related news, the open-source OpenSearch software has launched its 3.0 version, which includes GPU acceleration to enhance AI workloads through its new OpenSearch Vector Engine. This update leverages NVIDIA GPUs to improve search performance with large-scale vector workloads and reduce index build times, aiming to address scalability issues common in vector databases. OpenSearch 3.0 also supports Anthropic PBC’s Model Context Protocol, facilitating the integration of large language models with external data. The Millennium M2000 Supercomputer harnesses accelerated software from NVIDIA and Cadence for applications including circuit simulation, computational fluid dynamics, data center design and molecular design. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |