News from the AI & ML world
@the-decoder.com
//
Google has announced implicit caching in Gemini 2.5, a new feature designed to significantly reduce developer costs. The company aims to cut costs by as much as 75 percent by automatically applying a 75% cached token discount. This is a substantial improvement over previous methods, where developers had to manually configure caching. The new implicit caching automatically detects and stores recurring content, ensuring that repeated prompts are only processed once, which can lead to substantial cost savings.
The new feature is particularly beneficial for applications that run prompts against the same long context or continue existing conversations. Google recommends placing the stable part of a prompt, such as system instructions, at the start and adding user-specific input, like questions, afterwards to maximize benefits. Implicit caching kicks in for Gemini 2.5 Flash starting at 1,024 tokens, and for Pro versions from 2,048 tokens onwards. This functionality is now live, and developers can find more details and best practices in the Gemini API documentation.
This development builds on the overwhelmingly positive feedback to Gemini 2.5 Pro’s coding and multimodal reasoning capabilities. Beyond UI-focused development, these improvements extend to other coding tasks such as code transformation, code editing and developing complex agentic workflows. Simon Willison notes that Gemini 2.5 now applies the 75% cached token discount automatically, which he considers a potentially big cost saving for applications that run prompts against the same long context or continue existing conversations.
ImgSrc: the-decoder.com
References :
- bsky.app: Gemini 2.5 now applies the 75% cached token discount automatically - previously you had to manually configure it Potentially big cost savings here for applications that run prompts against the same long context, or continue existing conversations
- the-decoder.com: Google introduces implicit caching in Gemini 2.5, aiming to cut developer costs by as much as 75 percent.
- Simon Willison: Gemini 2.5 now applies the 75% cached token discount automatically - previously you had to manually configure it Potentially big cost savings here for applications that run prompts against the same long context, or continue existing conversations
- simonwillison.net: This article discusses the new implicit caching feature in Gemini 2.5 Pro, which automatically caches previous results to reduce costs by up to 75%.
- thetechbasic.com: This article talks about Google's new tool called implicit caching that helps developers save money on repeated prompts.
Classification: