Google DeepMind Releases Gemma 3n for On-Device AI

@learn.aisingapore.org //

Google DeepMind Releases Gemma 3n for On-Device AI

Google DeepMind has unveiled Gemma 3n, a groundbreaking compact and highly efficient multimodal AI model designed for real-time, on-device use. This innovation addresses the rising demand for faster, smarter, and more private AI experiences directly on mobile devices such as phones, tablets, and laptops. By embedding intelligence locally, developers can unlock near-instant responsiveness, reduce memory demands, and enhance user privacy. Gemma 3n is optimized for Android and Chrome platforms, targeting performance across these mobile environments and serving as the foundation for the next version of Gemini Nano.

Gemma 3n leverages a novel Google DeepMind innovation called Per-Layer Embeddings (PLE), significantly reducing RAM usage. This technology allows the model to operate with a dynamic memory footprint of just 2GB and 3GB, even though its raw parameter count is 5B and 8B. This makes it possible to run larger models on mobile devices or live-stream from the cloud, with memory overhead comparable to smaller models. The model also incorporates advanced activation quantization and KVC sharing to improve on-device performance and efficiency, responding approximately 1.5x faster on mobile with significantly better quality compared to previous models.

In addition to Gemma 3n, Google is also experimenting with Gemini Diffusion, an innovative system that generates text using diffusion techniques rather than traditional word-by-word prediction. This approach, inspired by image generation, refines noise in multiple passes to create full sections of text. DeepMind says this leads to more consistent and logically connected output, making it particularly effective for tasks requiring precision, coherence, and iteration, such as code generation and text editing. Gemini Diffusion achieves speeds of up to 2,000 tokens per second on programming tasks, demonstrating its potential as a fast and competitive alternative to autoregressive models.

Original img attribution: https://learn.aisingapore.org/wp-content/uploads/2025/05/Gemma3n_Metadatal_RD2-V01.2e16d0ba.fill-1200x600.jpg

ImgSrc: learn.aisingapo

References :

Google DeepMind Blog: Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI
THE DECODER: Gemini Diffusion could be Google's most important I/O news that slipped under the radar
www.marktechpost.com: Google DeepMind Releases Gemma 3n: A Compact, High-Efficiency Multimodal AI Model for Real-Time On-Device Use
AI Talent Development: Following the exciting launches of Gemma 3 and Gemma 3 QAT, our family of state-of-the-art open models capable of running on a single cloud or desktop accelerator, weâre pushing our vision for accessible AI even further.
Google DeepMind Blog: Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and sophisticated audio-centric experiences.

Classification:

HashTags: #Gemma3n #GoogleAI #MultimodalAI
Company: Google
Target: Mobile users
Product: Gemma
Feature: On-Device AI
Type: AI
Severity: Informative

News from the AI & ML world

DeeperML

Google DeepMind Releases Gemma 3n for On-Device AI

Classification: