staff@insideAI News
//
MLCommons has released the latest MLPerf Inference v5.0 benchmark results, highlighting the growing importance of generative AI in the machine learning landscape. The new benchmarks feature tests for large language models (LLMs) like Llama 3.1 405B and Llama 2 70B Interactive, designed to evaluate how well systems perform in real-world applications requiring agentic reasoning and low-latency responses. This shift reflects the industry's increasing focus on deploying generative AI and the need for hardware and software optimized for these demanding workloads.
The v5.0 results reveal significant performance improvements driven by advancements in both hardware and software. The median submitted score for Llama 2 70B has doubled compared to a year ago, and the best score is 3.3 times faster than Inference v4.0. These gains are attributed to innovations like support for lower-precision computation formats such as FP4, which allows for more efficient processing of large models. The MLPerf Inference benchmark suite evaluates machine learning performance in a way that is architecture-neutral, reproducible, and representative of real-world workloads. References :
Classification:
@www.intel.com
//
NVIDIA's Blackwell platform has dominated the latest MLPerf Inference V5.0 benchmarks, showcasing significant performance improvements in AI reasoning. The NVIDIA GB200 NVL72 system, featuring 72 Blackwell GPUs, achieved up to 30x higher throughput on the Llama 3.1 405B benchmark compared to the NVIDIA H200 NVL8 submission. This was driven by more than triple the performance per GPU and a 9x larger NVIDIA NVLink interconnect domain. The latest MLPerf results reflect the shift toward reasoning in AI inference.
Alongside this achievement, NVIDIA is open-sourcing the KAI Scheduler, a Kubernetes GPU scheduling solution, as part of its commitment to open-source AI innovation. Previously a core component of the Run:ai platform, KAI Scheduler is now available under the Apache 2.0 license. This solution is designed to address the unique challenges of managing AI workloads that utilize both GPUs and CPUs. According to NVIDIA, this will help in managing fluctuating GPU demands, which traditional resource schedulers struggle to handle. References :
Classification:
@www.intel.com
//
NVIDIA is making strides in both agentic AI and open-source initiatives. Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful AI models and practical enterprise applications. Enterprises are now deploying AI agents to free human workers from time-consuming and error-prone tasks, allowing them to focus on high-value work that requires creativity and strategic thinking. NVIDIA AI Blueprints help enterprises build their own AI agents.
NVIDIA has announced the open-source release of the KAI Scheduler, a Kubernetes-native GPU scheduling solution, now available under the Apache 2.0 license. Originally developed within the Run:ai platform, the KAI Scheduler is now available to the community while also continuing to be packaged and delivered as part of the NVIDIA Run:ai platform. The KAI Scheduler is designed to optimize the scheduling of GPU resources and tackle challenges associated with managing AI workloads on GPUs and CPUs. References :
Classification:
|
BenchmarksBlogsResearch Tools |