Google's Gemma 3 Models Optimized with Quantization-Aware Training

@Simon Willison's Weblog //

Google's Gemma 3 Models Optimized with Quantization-Aware Training

Google has released QAT (Quantization-Aware Training) optimized versions of its Gemma 3 models, aiming to make these powerful language models more accessible. By using quantization, the models are significantly reduced in size, allowing them to run on consumer-grade GPUs and even mobile devices. This approach dramatically reduces memory requirements while maintaining high quality, opening up possibilities for running large language models locally.

The key to this accessibility is Quantization-Aware Training (QAT), which simulates lower bit widths during training. This allows the model to adapt to these limits and minimize the performance drop typically associated with lower precision. Google reports significant model size reductions, with the Gemma 3 27B model dropping from 54GB to 14.1GB when quantized to int4 format. Similar reductions are seen across the Gemma 3 family, with the 12B model shrinking to 6.6GB, the 4B model to 2.6GB, and the 1B model to a mere 0.5GB.

Google partnered with Ollama, LM Studio, MLX, and llama.cpp to facilitate the use of these quantized models. Simon Willison reports that the "gemma3:27b-it-qat" model, requiring 22GB of RAM, has become a favorite local model on his Mac, accessible from his phone via Ollama, Open WebUI, and Tailscale. Willison also noted the snappily titled "Gemma-3-27b-it-qat-q4_0-gguf" sounds like a Wi-Fi password, but is in fact Google’s leanest LLM yet.

References :

bsky.app: I think the snappily titled "gemma3:27b-it-qat" may be my new favorite local model - needs 22GB of RAM on my Mac (I'm running it via Ollama, Open WebUI and Tailscale so I can access it from my phone too) and so far it seems extremely capable
Simon Willison's Weblog: Gemma 3 QAT Models
the-decoder.com: Gemma-3-27b-it-qat-q4_0-gguf sounds like a Wi-Fi password but itâ€™s Googleâ€™s leanest LLM yet

Classification:

HashTags: #Gemma3 #QAT #AIMemory
Company: Google
Target: AI Developers
Product: Gemma 3
Feature: Model Optimization
Type: AI
Severity: Informative

News from the AI & ML world

DeeperML

Google's Gemma 3 Models Optimized with Quantization-Aware Training

Classification: