News from the AI & ML world

DeeperML - #gke

@cloud.google.com //
Google is doubling down on empowering AI innovation with new enhancements to its Google Kubernetes Engine (GKE). Unveiled at Google Cloud Next 2025, these updates focus on simplifying AI adoption, scaling AI workloads efficiently, and optimizing AI inference performance. The enhancements aim to address the challenges of deploying generative AI by providing tools for infrastructure selection, intelligent load balancing, and cost reduction, all while leveraging the power of Kubernetes. These advancements reflect the increasing demand for AI inference capabilities as businesses seek to solve real-world problems with AI.

Google has introduced several key features to streamline AI inference on GKE, including GKE Inference Quickstart, GKE TPU serving stack, and GKE Inference Gateway. GKE Inference Quickstart helps users select the optimal accelerator, model server, and scaling configuration, providing insights into instance types, model compatibility, and performance benchmarks. The GKE TPU serving stack, with support for Tensor Processing Units (TPUs) and vLLM, enables seamless portability across GPUs and TPUs. Furthermore, the GKE Inference Gateway introduces AI-aware scaling and load balancing techniques, resulting in significant improvements in serving costs, tail latency, and throughput.

These GKE enhancements are designed to equip organizations for the agentic AI era, where multiple AI agents collaborate to accomplish tasks across various systems. Google is also offering tools like the Agent Development Kit (ADK), Agent Garden, and Agent Engine on Vertex AI to build and deploy custom agents. Google Cloud WAN, the company's internal advanced networking technology, is now available to customers, providing a high-performance, secure, and reliable network infrastructure for AI workloads. These efforts demonstrate Google Cloud's commitment to providing an open, comprehensive platform for production AI, enabling businesses to harness the power of AI with ease and efficiency.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • Practical Technology: Google reveals new Kubernetes and GKE enhancements for AI innovation
  • AI & Machine Learning: Details the new GKE inference capabilities that reduce costs, tail latency and increase throughput.
  • www.itpro.com: Google Cloud Next 2025: Targeting easy AI
  • AI & Machine Learning: Kubernetes, your AI superpower: How Google Kubernetes Engine powers AI innovation
  • cloud.google.com: Delivering an application-centric, AI-powered cloud for developers and operators
  • Runtime: Google promotes k8s for AI; IBM says use a mainframe
Classification:
staff@insideAI News //
Google Cloud has unveiled its seventh-generation Tensor Processing Unit (TPU), named Ironwood, at the recent Google Cloud Next 2025 conference. This new custom AI accelerator is specifically designed for inference workloads, marking a shift in Google's AI chip development strategy. Ironwood aims to meet the growing demands of "thinking models" like Gemini 2.5, addressing the increasing shift from model training to inference observed across the industry. According to Amin Vahdat, Google's Vice President and General Manager of ML, Systems, and Cloud AI, the aim is to enter the "age of inference" where AI agents proactively retrieve and generate data for insights.

Ironwood's technical specifications are impressive, offering substantial computational power and efficiency. When scaled to a pod of 9,216 chips, it can deliver 42.5 exaflops of compute, surpassing the world's fastest supercomputer, El Capitan, by more than 24 times. Each individual Ironwood chip boasts a peak compute of 4,614 teraflops. To manage the communication demands of modern AI, each Ironwood setup features Inter-Chip Interconnect (ICI) networking spanning nearly 10 MW and each chip is equipped with 192GB of High Bandwidth Memory (HBM) and memory bandwidth that reaches 7.2 terabits per second.

This focus on inference is a response to the evolving AI landscape where proactive AI agents are becoming more prevalent. Ironwood is engineered to minimize data movement and latency on-chip while executing massive tensor manipulations, crucial for handling large language models and advanced reasoning tasks. Google emphasizes that Ironwood offers twice the performance per watt compared to its predecessor, Trillium, and is nearly 30 times more power efficient than Google’s first Cloud TPU from 2018, addressing the critical need for power efficiency in modern data centers.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • insideAI News: Google Launches ‘Ironwood’ 7th Gen TPU for Inference
  • venturebeat.com: Google's new Ironwood chip is 24x more powerful than the world’s fastest supercomputer
  • www.bigdatawire.com: Google Cloud Preps for Agentic AI Era with ‘Ironwood’ TPU, New Models and Software
  • The Next Platform: With “Ironwood†TPU, Google Pushes The AI Accelerator To The Floor
  • www.itpro.com: Google Cloud Next 2025: Targeting easy AI
Classification:
Neel Patel@AI & Machine Learning //
Google Cloud and NVIDIA are collaborating to accelerate AI in healthcare by leveraging the NVIDIA BioNeMo framework and Google Kubernetes Engine (GKE). This partnership aims to speed up drug discovery and development by providing powerful infrastructure and tools for medical and pharmaceutical researchers. The NVIDIA BioNeMo platform is a generative AI framework enabling researchers to model and simulate biological sequences and structures, placing major demands for computing with powerful GPUs and scalable infrastructure.

With BioNeMo running on GKE, medical organizations can achieve breakthroughs and new research with levels of speed and effectiveness that were previously unheard of. Google DeepMind has also introduced Gemini Robotics, AI models built on Google's Gemini foundation model, enhancing robotics by integrating vision, language, and action. While AI isn't seen as a "silver bullet," Google DeepMind's Demis Hassabis emphasizes its undeniable benefits within five to ten years, as evidenced by developments like Alphafold 3, which accurately predicts the structure of molecules like DNA and RNA.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • Compute: Accelerating AI in healthcare using NVIDIA BioNeMo Framework and Blueprints on GKE
  • IEEE Spectrum: With Gemini Robotics, Google Aims for Smarter Robots
Classification: