News from the AI & ML world

DeeperML - #blackwellgpu

@www.nextplatform.com //
Nvidia's latest Blackwell GPUs are rapidly gaining traction in cloud deployments, signaling a significant shift in AI hardware accessibility for businesses. Amazon Web Services (AWS) has announced its first UltraServer supercomputers, which are pre-configured systems powered by Nvidia's Grace CPUs and the new Blackwell GPUs. These U-P6e instances are available in full and half rack configurations and leverage advanced NVLink 5 ports to create large shared memory compute complexes. This allows for a memory domain spanning up to 72 GPU sockets, effectively creating a massive, unified computing environment designed for intensive AI workloads.

Adding to the growing adoption, CoreWeave, a prominent AI cloud provider, has become the first to offer NVIDIA RTX PRO 6000 Blackwell GPU instances at scale. This move promises substantial performance improvements for AI applications, with reports of up to 5.6x faster LLM inference compared to previous generations. CoreWeave's commitment to early deployment of Blackwell technology, including the NVIDIA GB300 NVL72 systems, is setting new benchmarks in rack-scale performance. By combining Nvidia's cutting-edge compute with their specialized AI cloud platform, CoreWeave aims to provide a more cost-efficient yet high-performing alternative for companies developing and scaling AI applications, supporting everything from training massive language models to multimodal inference.

The widespread adoption of Nvidia's Blackwell GPUs by major cloud providers like AWS and specialized AI platforms like CoreWeave underscores the increasing demand for advanced AI infrastructure. This trend is further highlighted by Nvidia's recent milestone of becoming the world's first $4 trillion company, a testament to its leading role in the AI revolution. Moreover, countries like Indonesia are actively pursuing sovereign AI goals, partnering with companies like Nvidia, Cisco, and Indosat Ooredoo Hutchison to establish AI Centers of Excellence. These initiatives aim to foster localized AI research, develop local talent, and drive innovation, ensuring that nations can harness the power of AI for economic growth and digital independence.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • AWS News Blog: Amazon announces the general availability of EC2 P6e-GB200 UltraServers, powered by NVIDIA Grace Blackwell GB200 superchips that enable up to 72 GPUs with 360 petaflops of computing power for AI training and inference at the trillion-parameter scale.
  • AIwire: CoreWeave, Inc. today announced it is the first cloud platform to make NVIDIA RTX PRO 6000 Blackwell Server Edition instances generally available.
  • The Next Platform: Sizing Up AWS “Blackwell†GPU Systems Against Prior GPUs And Trainiums
Classification:
staff@insideAI News //
NVIDIA has reportedly broken a new AI world record, achieving over 1,000 tokens per second (TPS) per user with Meta's Llama 4 Maverick large language model. This breakthrough was accomplished using NVIDIA's DGX B200 node, which is equipped with eight Blackwell GPUs. The performance was independently measured by the AI benchmarking service Artificial Analysis. NVIDIA's Blackwell architecture offers substantial improvements in processing power, which enables faster inference times for large language models.

This record-breaking result was achieved through extensive software optimizations, including the use of TensorRT and the training of a speculative decoding draft model using EAGLE-3 techniques. These optimizations alone resulted in a 4x performance increase compared to Blackwell's previous best results. NVIDIA also leveraged FP8 data types for GEMMs, Mixture of Experts (MoE), and Attention operations to reduce model size and capitalize on Blackwell Tensor Core technology's high FP8 throughput. The company claims that accuracy when using the FP8 data format matches that of Artificial Analysis BF16 across many metrics.

NVIDIA reports that Blackwell reaches 72,000 TPS/server at its highest throughput configuration. NVIDIA says it achieved a 4x speed-up relative to the best prior Blackwell baseline by using TensorRT-LLM and training a speculative decoding draft model using EAGLE-3 techniques. This milestone underscores the considerable progress made in AI inference capabilities through NVIDIA's hardware and software innovations, thereby clearing the way for more efficient and responsive AI applications.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • insideAI News: Details of AI Inference: NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Llama 4 Maverick
  • insidehpc.com: Details on NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick
  • www.tomshardware.com: Reports that Nvidia has broken another AI world record, breaking over 1,000 TPS/user with a DGX B200 node boasting eight Blackwell GPUs inside.
  • insidehpc.com: NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model.
  • NVIDIA Technical Blog: NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over...
  • analyticsindiamag.com: Texas Instruments (TI) has announced a collaboration with NVIDIA to develop new power management and sensing technology aimed at supporting future high-voltage power systems in AI data centres.
  • www.servethehome.com: The Intel Xeon 6 with priority cores wins big at NVIDIA but there is a lot more going on in the release than meets the eye
Classification: