@cloud.google.com
//
Google is doubling down on empowering AI innovation with new enhancements to its Google Kubernetes Engine (GKE). Unveiled at Google Cloud Next 2025, these updates focus on simplifying AI adoption, scaling AI workloads efficiently, and optimizing AI inference performance. The enhancements aim to address the challenges of deploying generative AI by providing tools for infrastructure selection, intelligent load balancing, and cost reduction, all while leveraging the power of Kubernetes. These advancements reflect the increasing demand for AI inference capabilities as businesses seek to solve real-world problems with AI.
Google has introduced several key features to streamline AI inference on GKE, including GKE Inference Quickstart, GKE TPU serving stack, and GKE Inference Gateway. GKE Inference Quickstart helps users select the optimal accelerator, model server, and scaling configuration, providing insights into instance types, model compatibility, and performance benchmarks. The GKE TPU serving stack, with support for Tensor Processing Units (TPUs) and vLLM, enables seamless portability across GPUs and TPUs. Furthermore, the GKE Inference Gateway introduces AI-aware scaling and load balancing techniques, resulting in significant improvements in serving costs, tail latency, and throughput. These GKE enhancements are designed to equip organizations for the agentic AI era, where multiple AI agents collaborate to accomplish tasks across various systems. Google is also offering tools like the Agent Development Kit (ADK), Agent Garden, and Agent Engine on Vertex AI to build and deploy custom agents. Google Cloud WAN, the company's internal advanced networking technology, is now available to customers, providing a high-performance, secure, and reliable network infrastructure for AI workloads. These efforts demonstrate Google Cloud's commitment to providing an open, comprehensive platform for production AI, enabling businesses to harness the power of AI with ease and efficiency. References :
Classification:
staff@insideAI News
//
Google has unveiled its seventh-generation Tensor Processing Unit (TPU), named "Ironwood," marking a significant advancement in AI accelerator technology. Designed specifically for AI inference workloads, Ironwood is Google's most performant and scalable custom AI accelerator to date. This launch is part of Google Cloud's strategy to lead in supplying AI models, applications, and infrastructure, capitalizing on its own substantial AI needs to drive homegrown infrastructure development. Google Cloud is positioning itself for the "agentic AI era," with Ironwood playing a pivotal role in enabling multiple AI systems to work together across platforms.
Google's Ironwood TPU comes in configurations of up to 9,216 liquid-cooled chips interconnected via Inter-Chip Interconnect (ICI) networking, spanning nearly 10 MW. The architecture of the Ironwood TPU is designed to optimize hardware and software for AI workloads, allowing developers to leverage Google's Pathways software stack to harness tens of thousands of Ironwood TPUs. The chip design philosophy shift is from models that provide real-time information for interpretation, to models that proactively generate insights and interpretations, which Google calls the "age of inference." Ironwood is designed to manage the computational and communication demands of "thinking models" like large language models and advanced reasoning tasks. A configuration with 9,216 chips per pod can support more than 24 times the compute power of the world’s no. 1 supercomputer, El Capitan. NVIDIA is collaborating with Google Cloud to bring agentic AI to enterprises seeking to locally harness the Google Gemini family of AI models using the NVIDIA Blackwell HGX and DGX platforms and NVIDIA Confidential Computing for data safety. This collaboration enables enterprises to innovate securely while maintaining data privacy. References :
Classification:
Neel Patel@AI & Machine Learning
//
Google Cloud and NVIDIA are collaborating to accelerate AI in healthcare by leveraging the NVIDIA BioNeMo framework and Google Kubernetes Engine (GKE). This partnership aims to speed up drug discovery and development by providing powerful infrastructure and tools for medical and pharmaceutical researchers. The NVIDIA BioNeMo platform is a generative AI framework enabling researchers to model and simulate biological sequences and structures, placing major demands for computing with powerful GPUs and scalable infrastructure.
With BioNeMo running on GKE, medical organizations can achieve breakthroughs and new research with levels of speed and effectiveness that were previously unheard of. Google DeepMind has also introduced Gemini Robotics, AI models built on Google's Gemini foundation model, enhancing robotics by integrating vision, language, and action. While AI isn't seen as a "silver bullet," Google DeepMind's Demis Hassabis emphasizes its undeniable benefits within five to ten years, as evidenced by developments like Alphafold 3, which accurately predicts the structure of molecules like DNA and RNA. References :
Classification:
|
BenchmarksBlogsResearch Tools |