@Simon Willison's Weblog
//
References:
bsky.app
, Simon Willison's Weblog
,
Google has released QAT (Quantization-Aware Training) optimized versions of its Gemma 3 models, aiming to make these powerful language models more accessible. By using quantization, the models are significantly reduced in size, allowing them to run on consumer-grade GPUs and even mobile devices. This approach dramatically reduces memory requirements while maintaining high quality, opening up possibilities for running large language models locally.
The key to this accessibility is Quantization-Aware Training (QAT), which simulates lower bit widths during training. This allows the model to adapt to these limits and minimize the performance drop typically associated with lower precision. Google reports significant model size reductions, with the Gemma 3 27B model dropping from 54GB to 14.1GB when quantized to int4 format. Similar reductions are seen across the Gemma 3 family, with the 12B model shrinking to 6.6GB, the 4B model to 2.6GB, and the 1B model to a mere 0.5GB. Google partnered with Ollama, LM Studio, MLX, and llama.cpp to facilitate the use of these quantized models. Simon Willison reports that the "gemma3:27b-it-qat" model, requiring 22GB of RAM, has become a favorite local model on his Mac, accessible from his phone via Ollama, Open WebUI, and Tailscale. Willison also noted the snappily titled "Gemma-3-27b-it-qat-q4_0-gguf" sounds like a Wi-Fi password, but is in fact Google’s leanest LLM yet. Recommended read:
References :
@the-decoder.com
//
OpenAI has launched a significant update to ChatGPT, enhancing its memory capabilities to include the entire history of user conversations. This allows the AI model to draw on past interactions, providing more personalized and relevant responses. The upgrade is designed to make ChatGPT a more adaptable and long-term tool, evolving with users and understanding their preferences over time, across various modalities including text, voice, and image interactions.
ChatGPT Plus and Pro users will be the first to access the new memory feature, with rollout planned for Team, Enterprise, and Edu accounts in the coming weeks. The improved memory system includes two key components: "Reference saved memories," where users can explicitly direct ChatGPT to remember specific facts like names or preferences, and "Reference chat history," which allows the model to use context from prior conversations to adapt to a user's tone, goals, and interests. Users retain control over their information and can choose to disable the memory function entirely or set limitations on how ChatGPT references previous conversations. With the update, OpenAI aims to create a more seamless and context-aware experience for users. ChatGPT can now draw more naturally on past conversations, even in new chats, leading to more helpful and personalized responses. This enhancement positions ChatGPT alongside other digital assistants, moving towards more versatile companions powered by generative AI. The rollout excludes countries in the European Economic Area, the UK, Switzerland, Norway, Iceland, and Liechtenstein. Recommended read:
References :
Alexey Shabanov@TestingCatalog
//
References:
Data Phoenix
, TestingCatalog
OpenAI is developing new features for ChatGPT, including "Moonshine" memory, which allows the AI to recall past conversations for more context-aware interactions. A notification feed is also in the works, designed to keep users informed about new features and announcements. The company is also working on a "Whisper" button on the web app, enabling voice dictation, a feature already available on mobile and desktop versions.
Another upcoming feature is a "reasoning slider," allowing users to control the effort the model puts into completing tasks. Options like "think a little" or "think harder" will simplify choices for users unfamiliar with technical model differences. While these features are not yet fully available, some users have reported early access to the Whisper button and Moonshine memory. The company announced GPT-4o's new image generation capabilities, intending to replace DALL-E as the default image generation model. Due to high demand from subscribers, OpenAI has delayed the rollout of the GPT-4o image generation feature to free users. The feature, which allows for more precise and accurate image creation, was enthusiastically adopted by subscribers, prompting the delay for wider availability. Recommended read:
References :
@the-decoder.com
//
Google's Gemini AI now possesses the ability to remember past conversations, allowing for more personalized and context-aware interactions. This feature is currently available to paying subscribers and enables Gemini to recall user preferences and past discussions, enhancing its capacity to provide relevant and coherent responses. Users can now ask Gemini to provide a summary of past discussions, eliminating the need to start from scratch or search for previous threads.
The new memory feature, which extends beyond simply remembering preferences to recalling entire conversations, is currently available in English for Gemini Advanced subscribers on the web and mobile. Google says that users must “check responses for accuracy.” Users have the option to review, delete, or adjust how long Gemini retains their chats, and can also disable all Gemini app activities through the MyActivity tab. In the coming weeks, it will be expanded to more languages and Google Workspace Business and Enterprise customers. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |