News from the AI & ML world

DeeperML

@cloud.google.com //
Google Cloud is enhancing its AI Hypercomputer to accelerate AI inference workloads, focusing on maximizing performance and reducing costs for generative AI applications. At Google Cloud Next 25, updates to AI Hypercomputer's inference capabilities were shared, showcasing Google's newest Tensor Processing Unit (TPU) called Ironwood, designed for inference. Software enhancements include simple and performant inference using vLLM on TPU and the latest GKE inference capabilities such as GKE Inference Gateway and GKE Inference Quickstart. Google is paving the way for the next phase of AI's rapid evolution with the AI Hypercomputer.

Google's JetStream inference engine incorporates new performance optimizations, integrating Pathways for ultra-low latency multi-host, disaggregated serving. The sixth-generation Trillium TPU exceeds throughput performance by 2.9x for Llama 2 70B and 2.8x for Mixtral 8x7B compared to TPU v5e. Google’s JAX inference engine maximizes performance and reduces inference costs by offering more choice when serving LLMs on TPU. JetStream throughput is improved, achieving 1703 token/s on Llama 3.1 405B on Trillium.

Google is also intensifying its efforts to combat online scams by integrating artificial intelligence across Search, Chrome, and Android. AI is central to Google's anti-scam strategy, blocking hundreds of millions of scam results daily and identifying more fraudulent pages. Gemini Nano provides instant detection of high-risk websites, helping counter new and evolving scams across platforms. Google has long used AI to detect and block scams, including fake tech support, fraudulent financial services, and phishing links. Recent updates to AI classifiers now allow the company to detect 20 times more scam pages, improving the quality of search results by reducing exposure to harmful sites.
Original img attribution: https://storage.googleapis.com/gweb-cloudblog-publish/images/05_-_Compute.max-2600x2600.jpg
ImgSrc: storage.googlea

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • Compute: From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer
Classification: