News from the AI & ML world

DeeperML - #aitraining

Rashi Shrivastava,@Rashi Shrivastava //
References: www.tomsguide.com , Towards AI ,
OpenAI is making significant strides in AI training and infrastructure. Sam Altman, CEO of OpenAI, envisions a new type of computer designed specifically for AI, suggesting current devices are not optimized for advanced AI capabilities. This new hardware aims to support always-on, context-aware AI assistants that can understand and act on a user's environment, schedule, and preferences in real-time. These AI-first computers could handle tasks like booking travel, summarizing content, and planning daily schedules through an intelligent interface.

OpenAI is also actively involved in initiatives to improve AI literacy. The company is backing a new AI training academy for teachers, indicating a focus on integrating AI more effectively into education. Furthermore, OpenAI continues to refine its language models, such as ChatGPT, for diverse applications, including creating and grading assignments within the classroom setting. This effort reflects a broader push to enhance coding workflows and other tasks.

Adding to their suite of AI tools, OpenAI is reportedly preparing to launch a new AI-powered web browser. This browser is expected to rival Google Chrome, and is designed with a ChatGPT-like interface. Instead of traditional website navigation, interactions would be handled through the AI, streamlining tasks and potentially offering a more direct way to access information. Such a move could give OpenAI direct access to user data, which is crucial for enhancing their AI models and improving targeted advertising capabilities.

Recommended read:
References :
  • www.tomsguide.com: OpenAI's Sam Altman says your computer isn’t built for AI — so it’s creating something entirely new
  • Towards AI: AI in the Classroom: Create and Grade Assignments with ChatGPT
  • Rashi Shrivastava: The Prompt: OpenAI Backs New AI Training Academy For Teachers

@www.artificialintelligence-news.com //
The UK government is launching a nationwide initiative to boost AI skills among workers and schoolchildren, solidifying its position as a leader in AI development and innovation. Prime Minister Keir Starmer announced a partnership with tech giants like NVIDIA, Google, Microsoft, and Amazon to train 7.5 million workers in artificial intelligence skills. The program will provide freely available training materials to businesses over the next five years, focusing on practical applications such as using chatbots and large language models to enhance productivity. This initiative aims to equip the UK workforce with the necessary skills to thrive in an increasingly AI-driven economy.

As part of this comprehensive effort, all civil servants in England and Wales will receive practical AI training starting this autumn to enhance their work efficiency. The government aims to integrate AI into various aspects of public service, streamlining operations and improving productivity. Officials are already piloting AI tools, such as "Humphrey," named after the character from "Yes, Minister," to automate tasks and reduce the time spent on routine processes. The goal is to ensure that AI handles tasks where it can perform better, faster, and to the same high quality, freeing up civil servants for more complex and strategic work.

To support this AI skills drive, the government is also focusing on bolstering the UK's AI infrastructure. NVIDIA is investing in the UK, establishing an AI Technology Center to provide hands-on training in AI, data science, and accelerated computing. Cloud providers like Nscale and Nebius are deploying thousands of NVIDIA GPUs to enhance computational capabilities for research bodies, universities, and public services. The Prime Minister has pledged to invest approximately £1 billion in AI research compute by 2030, signaling a commitment to turning Britain into an AI superpower and attracting tech investment to stimulate economic growth.

Recommended read:
References :
  • techxplore.com: UK launches AI skills drive for workers and schoolchildren
  • www.theguardian.com: All civil servants in England and Wales to get AI training
  • NVIDIA Newsroom: UK Prime Minister, NVIDIA CEO Set the Stage as AI Lights Up Europe
  • techinformed.com: Nvidia can boost UK’s digital infrastructure, says Huang as Starmer promises £1bn for AI
  • www.artificialintelligence-news.com: UK tackles AI skills gap through NVIDIA partnership
  • blogs.nvidia.com: U.K. Prime Minister Keir Starmer’s ambition for Britain to be an “AI maker, not an AI taker,” is becoming a reality at London Tech Week.
  • ComputerWeekly.com: Starmer opens London Tech Week with £1bn AI boost
  • NVIDIA Newsroom: ‘AI Maker, Not an AI Taker’: UK Builds Its Vision With NVIDIA Infrastructure
  • Dataconomy: NVIDIA CEO Jensen Huang and U.K. Prime Minister Sir Keir Starmer opened London Tech Week at Olympia, signaling a national policy shift toward AI with investments in people, platforms, and partnerships. The U.K. will invest approximately £1 billion in AI research compute by 2030, starting this year.
  • insidehpc.com: LRZ to Acquire HPE-NVIDIA ‘Blue Lion’ Supercomputer

@www.linkedin.com //
Nvidia has once again asserted its dominance in the AI training landscape with the release of the MLPerf Training v5.0 results. The company's Blackwell GB200 accelerators achieved record time-to-train scores, showcasing a significant leap in performance. This latest benchmark suite included submissions from various companies, but Nvidia's platform stood out, particularly in the most demanding large language model (LLM)-focused test involving Llama 3.1 405B pretraining. These results underscore the rapid growth and evolution of the AI field, with the Blackwell architecture demonstrably meeting the heightened performance demands of next-generation AI applications.

The MLPerf Training v5.0 results highlight Nvidia's commitment to versatility, as it was the only platform to submit results across every benchmark. The at-scale submissions leveraged two AI supercomputers powered by the Blackwell platform: Tyche, built using GB200 NVL72 rack-scale systems, and Nyx, based on DGX B200 systems. Additionally, Nvidia collaborated with CoreWeave and IBM, utilizing a cluster of 2,496 Blackwell GPUs and 1,248 Grace CPUs. The new Llama 3.1 405B pretraining benchmark witnessed Blackwell delivering 2.2x greater performance compared to the previous generation architecture at the same scale.

The performance gains are attributed to advancements in the Blackwell architecture, encompassing high-density liquid-cooled racks, 13.4TB of coherent memory per rack, and fifth-generation NVLink and NVLink Switch interconnect technologies for scale-up, as well as Quantum-2 InfiniBand networking for scale-out. These technological innovations, combined with the NVIDIA NeMo Framework software stack, are raising the bar for next-generation multimodal LLM training. While AMD did showcase generational performance gains, Nvidia's GPUs reigned supreme, outpacing AMD's MI325X in MLPerf benchmarks, solidifying Nvidia's position as a leader in AI training capabilities.

Recommended read:
References :
  • NVIDIA Newsroom: NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results
  • MLCommons: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
  • IEEE Spectrum: Nvidia’s Blackwell Conquers Largest LLM Training Benchmark
  • www.aiwire.net: Blackwell GPUs Lift Nvidia to the Top of MLPerf Training Rankings
  • www.servethehome.com: MLPerf Training v5.0 is Out
  • IEEE Spectrum: Ideal networking aids performance of the largest submissions to the LLM fine-tuning benchmarks, the system with the largest number of GPUs was submitted by Nvidia, a computer connecting 512 B200s.
  • ServeTheHome: The new MLPerf Training v5.0 are dominated by NVIDIA Blackwell and Hopper results, but we also get AMD Instinct MI325X on a benchmark as well

@www.linkedin.com //
Nvidia's Blackwell GPUs have achieved top rankings in the latest MLPerf Training v5.0 benchmarks, demonstrating breakthrough performance across various AI workloads. The NVIDIA AI platform delivered the highest performance at scale on every benchmark, including the most challenging large language model (LLM) test, Llama 3.1 405B pretraining. Nvidia was the only vendor to submit results on all MLPerf Training v5.0 benchmarks, highlighting the versatility of the NVIDIA platform across a wide array of AI workloads, including LLMs, recommendation systems, multimodal LLMs, object detection, and graph neural networks.

The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs. The GB200 NVL72 systems achieved 90% scaling efficiency up to 2,496 GPUs, improving time-to-convergence by up to 2.6x compared to Hopper-generation H100.

The new MLPerf Training v5.0 benchmark suite introduces a pretraining benchmark based on the Llama 3.1 405B generative AI system, the largest model to be introduced in the training benchmark suite. On this benchmark, Blackwell delivered 2.2x greater performance compared with the previous-generation architecture at the same scale. Furthermore, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5x more performance compared with a submission using the same number of GPUs in the prior round. These performance gains highlight advancements in the Blackwell architecture and software stack, including high-density liquid-cooled racks, fifth-generation NVLink and NVLink Switch interconnect technologies, and NVIDIA Quantum-2 InfiniBand networking.

Recommended read:
References :
  • NVIDIA Newsroom: NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results
  • NVIDIA Technical Blog: NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0
  • IEEE Spectrum: Nvidia’s Blackwell Conquers Largest LLM Training Benchmark
  • NVIDIA Technical Blog: Reproducing NVIDIA MLPerf v5.0 Training Scores for LLM Benchmarks
  • AI News | VentureBeat: Nvidia says its Blackwell chips lead benchmarks in training AI LLMs
  • blogs.nvidia.com: NVIDIA RTX Blackwell GPUs Accelerate Professional-Grade Video Editing
  • MLCommons: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
  • www.aiwire.net: MLPerf Training v5.0 results show Nvidia’s Blackwell GB200 accelerators sprinting through record time-to-train scores.
  • blogs.nvidia.com: NVIDIA is working with companies worldwide to build out AI factories — speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training — the
  • mlcommons.org: New MLCommons MLPerf Training v5.0 Benchmark Results Reflect Rapid Growth and Evolution of the Field of AI
  • NVIDIA Newsroom: NVIDIA RTX Blackwell GPUs Accelerate Professional-Grade Video Editing
  • ServeTheHome: The new MLPerf Training v5.0 are dominated by NVIDIA Blackwell and Hopper results, but we also get AMD Instinct MI325X on a benchmark as well
  • AIwire: This is a news article on nvidia Blackwell GPUs lift Nvidia to the top of MLPerf Training Rankings
  • IEEE Spectrum: Nvidia’s Blackwell Conquers Largest LLM Training Benchmark
  • www.servethehome.com: MLPerf Training v5.0 is Out