@www.nextplatform.com
//
Nvidia's latest Blackwell GPUs are rapidly gaining traction in cloud deployments, signaling a significant shift in AI hardware accessibility for businesses. Amazon Web Services (AWS) has announced its first UltraServer supercomputers, which are pre-configured systems powered by Nvidia's Grace CPUs and the new Blackwell GPUs. These U-P6e instances are available in full and half rack configurations and leverage advanced NVLink 5 ports to create large shared memory compute complexes. This allows for a memory domain spanning up to 72 GPU sockets, effectively creating a massive, unified computing environment designed for intensive AI workloads.
Adding to the growing adoption, CoreWeave, a prominent AI cloud provider, has become the first to offer NVIDIA RTX PRO 6000 Blackwell GPU instances at scale. This move promises substantial performance improvements for AI applications, with reports of up to 5.6x faster LLM inference compared to previous generations. CoreWeave's commitment to early deployment of Blackwell technology, including the NVIDIA GB300 NVL72 systems, is setting new benchmarks in rack-scale performance. By combining Nvidia's cutting-edge compute with their specialized AI cloud platform, CoreWeave aims to provide a more cost-efficient yet high-performing alternative for companies developing and scaling AI applications, supporting everything from training massive language models to multimodal inference. The widespread adoption of Nvidia's Blackwell GPUs by major cloud providers like AWS and specialized AI platforms like CoreWeave underscores the increasing demand for advanced AI infrastructure. This trend is further highlighted by Nvidia's recent milestone of becoming the world's first $4 trillion company, a testament to its leading role in the AI revolution. Moreover, countries like Indonesia are actively pursuing sovereign AI goals, partnering with companies like Nvidia, Cisco, and Indosat Ooredoo Hutchison to establish AI Centers of Excellence. These initiatives aim to foster localized AI research, develop local talent, and drive innovation, ensuring that nations can harness the power of AI for economic growth and digital independence. References :
Classification:
staff@insideAI News
//
NVIDIA has reportedly broken a new AI world record, achieving over 1,000 tokens per second (TPS) per user with Meta's Llama 4 Maverick large language model. This breakthrough was accomplished using NVIDIA's DGX B200 node, which is equipped with eight Blackwell GPUs. The performance was independently measured by the AI benchmarking service Artificial Analysis. NVIDIA's Blackwell architecture offers substantial improvements in processing power, which enables faster inference times for large language models.
This record-breaking result was achieved through extensive software optimizations, including the use of TensorRT and the training of a speculative decoding draft model using EAGLE-3 techniques. These optimizations alone resulted in a 4x performance increase compared to Blackwell's previous best results. NVIDIA also leveraged FP8 data types for GEMMs, Mixture of Experts (MoE), and Attention operations to reduce model size and capitalize on Blackwell Tensor Core technology's high FP8 throughput. The company claims that accuracy when using the FP8 data format matches that of Artificial Analysis BF16 across many metrics. NVIDIA reports that Blackwell reaches 72,000 TPS/server at its highest throughput configuration. NVIDIA says it achieved a 4x speed-up relative to the best prior Blackwell baseline by using TensorRT-LLM and training a speculative decoding draft model using EAGLE-3 techniques. This milestone underscores the considerable progress made in AI inference capabilities through NVIDIA's hardware and software innovations, thereby clearing the way for more efficient and responsive AI applications. References :
Classification:
|
BenchmarksBlogsResearch Tools |