News from the AI & ML world

DeeperML - #reasoning

Eric Hal@techradar.com //
Google I/O 2025 saw the unveiling of 'AI Mode' for Google Search, signaling a significant shift in how the company approaches information retrieval and user experience. The new AI Mode, powered by the Gemini 2.5 model, is designed to offer more detailed results, personal context, and intelligent assistance. This upgrade aims to compete directly with the capabilities of AI chatbots like ChatGPT, providing users with a more conversational and comprehensive search experience. The rollout has commenced in the U.S. for both the browser version of Search and the Google app, although availability in other countries remains unconfirmed.

AI Mode brings several key features to the forefront, including Deep Search, Live Visual Search, and AI-powered agents. Deep Search allows users to delve into topics with unprecedented depth, running hundreds of searches simultaneously to generate expert-level, fully-cited reports in minutes. With Search Live, users can leverage their phone's camera to interact with Search in real-time, receiving context-aware responses from Gemini. Google is also bringing agentic capabilities to Search, allowing users to perform tasks like booking tickets and making reservations directly through the AI interface.

Google’s revamp of its AI search service appears to be a response to the growing popularity of AI-driven search experiences offered by companies like OpenAI and Perplexity. According to Gartner analyst Chirag Dekate, evidence suggests a greater reliance on search and AI-infused search experiences. As AI Mode rolls out, Google is encouraging website owners to optimize their content for AI-powered search by creating unique, non-commodity content and ensuring that their sites meet technical requirements and provide a good user experience.

Recommended read:
References :
  • Search Engine Journal: Google's new AI Mode in Search, integrating Gemini 2.5, aims to enhance user interaction by providing more conversational and comprehensive responses.
  • www.techradar.com: Google just got a new 'Deep Think' mode – and 6 other upgrades
  • WhatIs: Google expands Gemini model, Search as AI rivals encroach
  • www.tomsguide.com: Google Search gets an AI tab — here’s what it means for your searches
  • AI News | VentureBeat: Inside Google’s AI leap: Gemini 2.5 thinks deeper, speaks smarter and codes faster
  • Search Engine Journal: Google Gemini upgrades include Chrome integration, Live visual tools, and enhanced 2.5 models. Learn how these AI advances could reshape your marketing strategy.
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent models are getting even better
  • learn.aisingapore.org: Updates to Gemini 2.5 from Google DeepMind
  • THE DECODER: Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities
  • www.techradar.com: I've been using Google's new AI mode for Search – here's how to master it
  • www.theguardian.com: Search engine revamp and Gemini 2.5 introduced at conference in latest showing tech giant is all in on AI on Tuesday unleashed another wave of technology to accelerate a year-long makeover of its search engine that is changing the way people get information and curtailing the flow of internet traffic to other websites.
  • LearnAI: Updates to Gemini 2.5 from Google DeepMind
  • www.analyticsvidhya.com: Google I/O 2025: AI Mode on Google Search, Veo 3, Imagen 4, Flow, Gemini Live, and More
  • techvro.com: Google AI Mode Promises Deep Search and Goes Beyond AI Overviews
  • THE DECODER: Google pushes AI-powered search with agents, multimodality, and virtual shopping
  • felloai.com: Google I/O 2025 Recap With All The Jaw-Dropping AI Announcements
  • Analytics Vidhya: Google I/O 2025: AI Mode on Google Search, Veo 3, Imagen 4, Flow, Gemini Live, and More
  • LearnAI: Gemini as a universal AI assistant
  • Fello AI: Google I/O 2025 Recap With All The Jaw-Dropping AI Announcements
  • AI & Machine Learning: Today at Google I/O, we're expanding that help enterprises build more sophisticated and secure AI-driven applications and agents
  • www.techradar.com: Google Gemini 2.5 Flash promises to be your favorite AI chatbot, but how does it compare to ChatGPT 4o?
  • www.laptopmag.com: From $250 AI subscriptions to futuristic glasses and search that talks back, here’s what people are saying about Tuesday's Google I/O.
  • www.tomsguide.com: Google’s Gemini AI can now access Gmail, Docs, Drive, and more to deliver personalized help — but it raises new privacy concerns.
  • Data Phoenix: Google updated its model lineup and introduced a 'Deep Think' reasoning mode for Gemini 2.5 Pro
  • Maginative: Google’s revamped Canvas, powered by the Gemini 2.5 Pro model, lets you turn ideas into apps, quizzes, podcasts, and visuals in seconds—no code required.
  • Tech News | Euronews RSS: The tech giant is introducing a new "AI mode" that will embed chatbot capabilities into its search engine to keep up with rivals like OpenAI's ChatGPT.
  • learn.aisingapore.org: Advancing Gemini’s security safeguards – Google DeepMind
  • Data Phoenix: Google has launched major Gemini updates, including free visual assistance via Gemini Live, new subscription tiers starting at $19.99/month, advanced creative tools like Veo 3 for video generation with native audio, and an upcoming autonomous Agent Mode for complex task management.
  • www.zdnet.com: Everything from Google I/O 2025 you might've missed: Gemini, smart glasses, and more
  • thetechbasic.com: Google now adds ads to AI Mode and AI Overviews in search
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent models are getting even better

Last Week@Last Week in AI //
Anthropic is enhancing its Claude AI model through new integrations and security measures. A new Claude Neptune model is undergoing internal red team reviews to probe its robustness against jailbreaking and ensure its safety protocols are effective. The red team exercises are set to run until May 18, focusing particularly on vulnerabilities in the constitutional classifiers that underpin Anthropic’s safety measures, suggesting that the model is more capable and sensitive, requiring more stringent pre-release testing.

Anthropic has also launched a new feature allowing users to connect more apps to Claude, enhancing its functionality and integration with various tools. This new app connection feature, called Integrations, is available in beta for subscribers to Anthropic’s Claude Max, Team, and Enterprise plans, and soon Pro. It builds on the company's MCP protocol, enabling Claude to draw data from business tools, content repositories, and app development environments, allowing users to connect their tools to Claude, and gain deep context about their work.

Anthropic is also addressing the malicious uses of its Claude models, with a report outlining case studies on how threat actors have misused the models and the steps taken to detect and counter such misuse. One notable case involved an influence-as-a-service operation that used Claude to orchestrate social media bot accounts, deciding when to comment, like, or re-share posts. Anthropic has also observed cases of credential stuffing operations, recruitment fraud campaigns, and AI-enhanced malware generation, reinforcing the importance of ongoing security measures and sharing learnings with the wider AI ecosystem.

Recommended read:
References :

Coen van@Techzine Global //
ServiceNow has announced the launch of AI Control Tower, a centralized control center designed to manage, secure, and optimize AI agents, models, and workflows across an organization. Unveiled at Knowledge 2025 in Las Vegas, this platform provides a holistic view of the entire AI ecosystem, enabling enterprises to monitor and manage both ServiceNow and third-party AI agents from a single location. The AI Control Tower aims to address the growing complexity of managing AI deployments, giving users a central point to see all AI systems, their deployment status, and ensuring governance and understanding of their activities.

The AI Control Tower offers key benefits such as enterprise-wide AI visibility, built-in compliance and AI governance, end-to-end lifecycle management of agentic processes, real-time reporting, and improved alignment. It is designed to help AI systems administrators and other stakeholders monitor and manage every AI agent, model, or workflow within their system, providing real-time reporting for different metrics and embedded compliance and AI governance. The platform helps users understand the different systems by provider and type, improving risk and compliance management.

In addition to the AI Control Tower, ServiceNow introduced AI Agent Fabric, facilitating communication between AI agents and partner integrations. ServiceNow has also partnered with NVIDIA to engineer an open-source model, Apriel Nemotron 15B, designed to drive advancements in enterprise large language models (LLMs) and power AI agents that support various enterprise workflows. The Apriel Nemotron 15B, developed using NVIDIA NeMo and ServiceNow domain-specific data, is engineered for reasoning, drawing inferences, weighing goals, and navigating rules in real time, making it efficient and scalable for concurrent enterprise workflows.

Recommended read:
References :
  • thenewstack.io: Given that ServiceNow is, at its core, all about automating workflows for enterprises, it’s no surprise that
  • AI News | VentureBeat: ServiceNow also announced a way for agents to communicate with others along with its new observability platform.
  • Techzine Global: During Knowledge 2025 , ServiceNow launched AI Control Tower, a centralized control center for managing, securing, and optimizing AI agents, models, and workflows.
  • NVIDIA Blog: Your Service Teams Just Got a New Coworker — and It’s a 15B-Parameter Super Genius Built by ServiceNow and NVIDIA
  • www.zdnet.com: ServiceNow and Nvidia's new reasoning AI model raises the bar for enterprise AI agents
  • www.networkworld.com: ServiceNow unveiled a centralized command center the company says will enable enterprise customers to govern, manage, and secure AI agents from ServiceNow and other third-parties from a unified platform.
  • www.computerworld.com: Nvidia and ServiceNow have created an AI model that can help companies create learning AI agents to automate corporate workloads. The open-source Apriel model, available generally in the second quarter on HuggingFace, will help create AI agents that can make decisions around IT, human resources and customer-service functions.
  • blogs.nvidia.com: ServiceNow is accelerating enterprise AI with a new reasoning model built in partnership with NVIDIA — enabling AI agents that respond in real time, handle complex workflows and scale functions like IT, HR and customer service teams worldwide.
  • NVIDIA Newsroom: ServiceNow is accelerating enterprise AI with a new reasoning model built in partnership with NVIDIA — enabling AI agents that respond in real time, handle complex workflows and scale functions like IT, HR and customer service teams worldwide.
  • techstrong.ai: ServiceNow Inc. kicked off its annual artificial intelligence (AI) conference in Las Vegas Tuesday as it has in previous years -- with a fusillade of product announcements, partnerships and customer stories.
  • techstrong.ai: ServiceNow’s New AI Control Tower Commands AI Agents
  • Ken Yeung: ServiceNow Debuts AI Control Tower to Manage the Chaos of Enterprise AI Agents
  • Ken Yeung: ServiceNow and Nvidia have had a long-standing partnership building generative AI solutions for the enterprise. This week, at ServiceNow’s Knowledge customer conference, the two are introducing the latest fruits of their labor, a new large language model called Apriel Nemotron 15B with reasoning capabilities.
  • CIO Dive - Latest News: ServiceNow, Nvidia develop LLM to fuel enterprise agents
  • AI News: ServiceNow bets on unified AI to untangle enterprise complexity
  • www.artificialintelligence-news.com: ServiceNow bets on unified AI to untangle enterprise complexity
  • www.marktechpost.com: ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency

erichs211@gmail.com (Eric@techradar.com //
Google's powerful AI model, Gemini 2.5 Pro, has achieved a significant milestone by completing the classic Game Boy game Pokémon Blue. This accomplishment, spearheaded by software engineer Joel Z, demonstrates the AI's enhanced reasoning and problem-solving abilities. Google CEO Sundar Pichai celebrated the achievement online, highlighting it as a substantial win for AI development. The project showcases how AI can learn to handle complex tasks, requiring long-term planning, goal tracking, and visual navigation, which are vital components in the pursuit of general artificial intelligence.

Joel Z facilitated Gemini's gameplay over several months, livestreaming the AI's progress. While Joel is not affiliated with Google, his efforts were supported by the company's leadership. To enable Gemini to navigate the game, Joel used an emulator, mGBA, to feed screenshots and game data, like character position and map layout. He also incorporated smaller AI helpers, like a "Pathfinder" and a "Boulder Puzzle Solver," to tackle particularly challenging segments. These sub-agents, also versions of Gemini, were deployed strategically by the AI to manage complex situations, showcasing its ability to differentiate between routine and complicated tasks.

Google is also experimenting with transforming its search engine into a Gemini-powered chatbot via an AI Mode. This new feature, currently being tested with a small percentage of U.S. users, delivers conversational answers generated from Google's vast index, effectively turning Search into an answer engine. Instead of a list of links, AI Mode provides rich, visual summaries and remembers prior queries, directly competing with the search features of Perplexity and ChatGPT. While this shift could potentially impact organic SEO tactics, it signifies Google's commitment to integrating AI more deeply into its core products, offering users a more intuitive and informative search experience.

Recommended read:
References :
  • the-decoder.com: Google's reasoning LLM Gemini 2.5 Pro beats Pokémon Blue with a little help
  • thetechbasic.com: Google’s powerful AI model, Gemini 2.5 Pro, has finished playing the old Game Boy game Pokémon Blue.
  • www.techradar.com: Google's Gemini AI Is now a Pokémon Master
  • THE DECODER: Google's reasoning LLM Gemini 2.5 Pro beats Pokémon Blue with a little help
  • The Tech Basic: Google Gemini AI Beats Pokémon Blue With Help and Updates

Alexey Shabanov@TestingCatalog //
Meta is actively expanding the capabilities of its standalone Meta AI app, introducing new features focused on enhanced personalization and functionality. The company is developing a "Discover AIs" tab, which could serve as a hub for users to explore and interact with various AI assistants, potentially including third-party or specialized models. This aligns with Meta’s broader strategy to integrate personalized AI agents across its apps and hardware. Meta launched a dedicated Meta AI app powered by Llama 4 that focuses on offering more natural voice conversations and can leverage user data from Facebook and Instagram to provide tailored responses.

Meta is also testing a "reasoning" mode, suggesting the company aims to provide more transparent and advanced explanations in its AI assistant's responses. While the exact implementation remains unclear, the feature could emphasize structured logic or chain-of-thought capabilities, similar to developments in models from OpenAI and Google DeepMind. This would give users greater insight into how the AI derives its answers, potentially boosting trust and utility for complex queries.

Further enhancing user experience, Meta is working on new voice settings, including "Focus on my voice" and "Welcome message." "Focus on my voice" could improve the AI's ability to isolate and respond to the primary user's speech in environments with multiple speakers. The "Welcome message" feature might offer a customizable greeting or onboarding experience when the assistant is activated. These features are particularly relevant for Meta’s hardware ambitions, such as its Ray-Ban smart glasses and future AR devices, where voice interaction plays a critical role. To ensure privacy, Meta is also developing Private Processing for AI tools on WhatsApp, allowing users to leverage AI in a secure way.

Recommended read:
References :
  • Engineering at Meta: We are inspired by the possibilities of AI to help people be more creative, productive, and stay closely connected on WhatsApp, so we set out to build a new technology that allows our users around the world to use AI in a privacy-preserving way. We’re sharing an early look into Private Processing, an optional capability
  • TestingCatalog: Discover Meta AI's latest features: "Discover AIs" tab, "reasoning" mode, and new voice settings. Enhance your AI experience with personalized and advanced interactions.
  • Data Phoenix: Meta just launched a standalone Meta AI app powered by Llama 4 that focuses on offering more natural voice conversations and can leverage user data from Facebook and Instagram to provide tailored responses.
  • SiliconANGLE: Meta announces standalone AI app for personalized assistance

Matthias Bastian@THE DECODER //
References: THE DECODER , Ken Yeung , Analytics Vidhya ...
Microsoft has launched three new additions to its Phi series of compact language models: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These models are designed to excel in complex reasoning tasks, including mathematical problem-solving, algorithmic planning, and coding, demonstrating that smaller AI models can achieve significant performance. The models are optimized to handle complex problems through structured reasoning and internal reflection, while also being efficient enough to run on lower-end hardware, including mobile devices, making advanced AI accessible on resource-limited devices.

Phi-4-reasoning, a 14-billion parameter model, was trained using supervised fine-tuning with reasoning paths from OpenAI's o3-mini. Phi-4-reasoning-plus enhances this with reinforcement learning and processes more tokens, leading to higher accuracy, although with increased computational cost. Notably, these models outperform larger systems, such as the 70B parameter DeepSeek-R1-Distill-Llama, and even surpass DeepSeek-R1 with 671 billion parameters on the AIME-2025 benchmark, a qualifier for the U.S. Mathematical Olympiad, highlighting the effectiveness of Microsoft's approach to efficient, high-performing AI.

The Phi-4 reasoning models show strong results in programming, algorithmic problem-solving, and planning tasks, with improvements in logical reasoning positively impacting general capabilities such as following prompts and answering questions based on long-form content. Microsoft employed a data-centric training strategy, using structured reasoning outputs marked with special tokens to guide the model's intermediate reasoning steps. The open-weight models have been released with transparent training details and are hosted on Hugging Face, allowing for public access, fine-tuning, and use in various applications under a permissive MIT license.

Recommended read:
References :
  • THE DECODER: Microsoft is expanding its Phi series of compact language models with three new variants designed for advanced reasoning tasks.
  • Ken Yeung: Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go
  • AI News | VentureBeat: Microsoft Research has announced the release of Phi-4-reasoning-plus, an open-weight language model built for tasks requiring deep, structured reasoning.
  • Analytics Vidhya: Microsoft isn’t like OpenAI, Google, and Meta; especially not when it comes to large language models.
  • MarkTechPost: Despite notable advancements in large language models (LLMs), effective performance on reasoning-intensive tasks—such as mathematical problem solving, algorithmic planning, or coding—remains constrained by model size, training methodology, and inference-time capabilities.
  • the-decoder.com: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • www.tomsguide.com: Microsoft just unveiled new Phi-4 reasoning AI models — here's why they're a big deal
  • www.marktechpost.com: Despite notable advancements in large language models (LLMs), effective performance on reasoning-intensive tasks—such as mathematical problem solving, algorithmic planning, or coding—remains constrained by model size, training methodology, and inference-time capabilities.
  • www.windowscentral.com: Microsoft just launched expanded small language models (SLMs) based on its own Phi-4 AI.
  • simonwillison.net: This article discusses Microsoft's phi4-reasoning model, which generates 56 sentences of reasoning output in response to a simple prompt.
  • Data Phoenix: Microsoft launches Phi-4 'reasoning' models to celebrate Phi-3's first anniversary

Carl Franzen@AI News | VentureBeat //
Microsoft has announced the release of Phi-4-reasoning-plus, a new small, open-weight language model designed for advanced reasoning tasks. Building upon the architecture of the previously released Phi-4, this 14-billion parameter model integrates supervised fine-tuning and reinforcement learning to achieve strong performance on complex problems. According to Microsoft, the Phi-4 reasoning models outperform larger language models on several demanding benchmarks, despite their compact size. This new model pushes the limits of small AI, demonstrating that carefully curated data and training techniques can lead to impressive reasoning capabilities.

The Phi-4 reasoning family, consisting of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, is specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Phi-4-reasoning-plus, in particular, extends supervised fine-tuning with outcome-based reinforcement learning, which is targeted for improved performance in high-variance tasks such as competition-level mathematics. All models are designed to enable reasoning capabilities, especially on lower-performance hardware such as mobile devices.

Microsoft CEO Satya Nadella revealed that AI is now contributing to 30% of Microsoft's code. The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access. The model has been released under a permissive MIT license, enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction.

Recommended read:
References :
  • the-decoder.com: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • MarkTechPost: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • THE DECODER: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • AI News | VentureBeat: The release demonstrates that with carefully curated data and training techniques, small models can deliver strong reasoning performance.
  • Maginative: Microsoft’s Phi-4 Reasoning Models Push the Limits of Small AI
  • www.marktechpost.com: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • www.tomshardware.com: Microsoft's CEO reveals that AI writes up to 30% of its code — some projects may have all of its code written by AI
  • Ken Yeung: Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go
  • www.tomsguide.com: Microsoft just unveiled new Phi-4 reasoning AI models — here's why they're a big deal
  • Techzine Global: Microsoft is launching three new advanced small language models as an extension of the Phi series. These models have reasoning capabilities that enable them to analyze and answer complex questions effectively.
  • Analytics Vidhya: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.analyticsvidhya.com: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.windowscentral.com: Microsoft Introduces Phi-4 Reasoning SLM Models — Still "Making Big Leaps in AI" While Its Partnership with OpenAI Frays
  • Towards AI: Phi-4 Reasoning Models
  • the-decoder.com: Microsoft's Phi 4 responds to a simple "Hi" with 56 thoughts
  • Data Phoenix: Microsoft has introduced three new small language models—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.
  • AI News: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.

@www.quantamagazine.org //
Recent developments in the field of large language models (LLMs) are focusing on enhancing reasoning capabilities through reinforcement learning. This approach aims to improve model accuracy and problem-solving, particularly in challenging tasks. While some of the latest LLMs, such as GPT-4.5 and Llama 4, were not explicitly trained using reinforcement learning for reasoning, the release of OpenAI's o3 model shows that strategically investing in compute and tailored reinforcement learning methods can yield significant improvements.

Competitors like xAI and Anthropic have also been incorporating more reasoning features into their models, such as the "thinking" or "extended thinking" button in xAI Grok and Anthropic Claude. The somewhat muted response to GPT-4.5 and Llama 4, which lack explicit reasoning training, suggests that simply scaling model size and data may be reaching its limits. The field is now exploring ways to make language models work better, including the use of reinforcement learning.

One of the ways that researchers are making language models work better is to sidestep the requirement for language as an intermediary step. Language isn't always necessary, and that having to turn ideas into language can slow down the thought process. LLMs process information in mathematical spaces, within deep neural networks, however, they must often leave this latent space for the much more constrained one of individual words. Recent papers suggest that deep neural networks can allow language models to continue thinking in mathematical spaces before producing any text.

Recommended read:
References :
  • pub.towardsai.net: The article discusses the application of reinforcement learning to improve the reasoning abilities of LLMs.
  • Sebastian Raschka, PhD: This blog post delves into the current state of reinforcement learning in enhancing LLM reasoning capabilities, highlighting recent advancements and future expectations.
  • Quanta Magazine: This article explores the use of reinforcement learning to make Language Models work better, especially in challenging reasoning tasks.

@www.analyticsvidhya.com //
OpenAI recently unveiled its groundbreaking o3 and o4-mini AI models, representing a significant leap in visual problem-solving and tool-using artificial intelligence. These models can manipulate and reason with images, integrating them directly into their problem-solving process. This unlocks a new class of problem-solving that blends visual and textual reasoning, allowing the AI to not just see an image, but to "think with it." The models can also autonomously utilize various tools within ChatGPT, such as web search, code execution, file analysis, and image generation, all within a single task flow.

These models are designed to improve coding capabilities, and the GPT-4.1 series includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. GPT-4.1 demonstrates enhanced performance and lower prices, achieving a 54.6% score on SWE-bench Verified, a significant 21.4 percentage point increase from GPT-4o. This is a big gain in practical software engineering capabilities. Most notably, GPT-4.1 offers up to one million tokens of input context, compared to GPT-4o's 128k tokens, making it suitable for processing large codebases and extensive documentation. GPT-4.1 mini and nano also offer performance boosts at reduced latency and cost.

The new models are available to ChatGPT Plus, Pro, and Team users, with Enterprise and education users gaining access soon. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks. With Deep Research products and o3/o4-mini, AI-assisted search-based research is now effective.

Recommended read:
References :
  • bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
  • TestingCatalog: OpenAI’s o3 and o4‑mini bring smarter tools and faster reasoning to ChatGPT
  • thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. These models feel incredibly smart.
  • venturebeat.com: OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major advance in visual problem-solving and tool-using artificial intelligence.
  • www.techrepublic.com: OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will get access next week.
  • the-decoder.com: OpenAI's o3 achieves near-perfect performance on long context benchmark
  • the-decoder.com: Safety assessments show that OpenAI's o3 is probably the company's riskiest AI model to date
  • www.unite.ai: Inside OpenAI’s o3 and o4‑mini: Unlocking New Possibilities Through Multimodal Reasoning and Integrated Toolsets
  • thezvi.wordpress.com: Discusses the release of OpenAI's o3 and o4-mini reasoning models and their enhanced capabilities.
  • Simon Willison's Weblog: OpenAI o3 and o4-mini System Card
  • Interconnects: OpenAI's o3: Over-optimization is back and weirder than ever. Tools, true rewards, and a new direction for language models.
  • techstrong.ai: Nobody’s Perfect: OpenAI o3, o4 Reasoning Models Have Some Kinks
  • bsky.app: It's been a couple of years since GPT-4 powered Bing, but with the various Deep Research products and now o3/o4-mini I'm ready to say that AI assisted search-based research actually works now
  • www.analyticsvidhya.com: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
  • pub.towardsai.net: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia Nemotron-H) Also, Grok-3 Mini Shakes Up Cost Efficiency, Codex, Cohere Embed 4, PerceptionLM & more.
  • Last Week in AI: Last Week in AI #307 - GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2
  • composio.dev: OpenAI o3 vs. Gemini 2. 5 Pro vs. o4-mini
  • Towards AI: Details about Open AI's Agentic O3 models

@www.analyticsvidhya.com //
OpenAI has recently launched its o3 and o4-mini models, marking a shift towards AI agents with enhanced tool-use capabilities. These models are specifically designed to excel in areas such as web search, code interpretation, and memory utilization, leveraging reinforcement learning to optimize their performance. The focus is on creating AI that can intelligently use tools in a loop, behaving more like a streamlined and rapid-response system for complex tasks. The development underscores a growing industry trend of major AI labs delivering inference-optimized models ready for immediate deployment.

The o3 model stands out for its ability to provide quick answers, often within 30 seconds to three minutes, a significant improvement over the longer response times of previous models. This speed is coupled with integrated tool use, making it suitable for real-world applications requiring quick, actionable insights. Another key advantage of o3 is its capability to manipulate image inputs using code, allowing it to identify key features by cropping and zooming, which has been demonstrated in tasks such as the "GeoGuessr" game.

While o3 demonstrates strengths across various benchmarks, tests have also shown variances in performance compared to other models like Gemini 2.5 and even its smaller counterpart, o4-mini. While o3 leads on most benchmarks and set a new state-of-the-art with 79.60% on the Aider polyglot coding benchmark, the costs are much higher. However, when used as a planner and GPT-4.1, the pair scored a new SOTA with 83% at 65% of the cost, though still expensive. One analysis notes the importance of context awareness when iterating on code, which Gemini 2.5 seems to handle better than o3 and o4-mini. Overall, the models represent OpenAI's continued push towards more efficient and agentic AI systems.

Recommended read:
References :
  • bdtechtalks.com: OpenAI's new reasoning models, o3 and o4-mini, enhance problem-solving capabilities and tool use, making them more effective than their predecessors.
  • Data Phoenix: OpenAI has launched o3 and o4-mini, which combine sophisticated reasoning capabilities with comprehensive tool integration.
  • THE DECODER: OpenAI's new language model o3 shows concrete signs of deception, manipulation and sabotage behavior for the first time.
  • thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini.
  • Simon Willison's Weblog: I'm surprised to see a combined System Card for o3 and o4-mini in the same document - I'd expect to see these covered separately. The opening paragraph calls out the most interesting new ability of these models (see also
  • techstrong.ai: Nobody’s Perfect: OpenAI o3, o4 Reasoning Models Have Some Kinks
  • Analytics Vidhya: OpenAI's o3 and o4-mini models have advanced reasoning capabilities. They have demonstrated success in problem-solving tasks in various areas, from mathematics to coding, with results showing potential advantages in efficiency and capabilities compared to prior generations.
  • pub.towardsai.net: Louie Peters analyzes OpenAI's o3, DeepMind's Gemma, and Nvidia's Nemotron-H, focusing on inference-optimized open-weight models.
  • Towards AI: Towards AI Editorial Team on OpenAI's o3 and o4-mini models, emphasizing tool use and agentic capabilities.
  • composio.dev: OpenAI o3 vs. Gemini 2.5 Pro vs. o4-mini

Michael Nuñez@venturebeat.com //
Google has unveiled Gemini 2.5 Flash, a new AI model designed to give businesses greater control over AI costs and performance. Available in preview through Google AI Studio and Vertex AI, Gemini 2.5 Flash introduces adjustable "thinking budgets," allowing developers to specify the amount of computational power the AI should use for reasoning. This innovative approach aims to strike a balance between advanced AI capabilities and cost-efficiency, addressing a key concern for businesses integrating AI into their operations. The model is also capable of generating SVGs.

The introduction of "thinking budgets" marks a strategic move by Google to deliver cost-effective AI solutions. Developers can now fine-tune the AI's processing power, allocating resources based on the complexity of the task at hand. With Gemini 2.5 Flash, the "thinking" capability can be turned on or off, creating a hybrid reasoning model that prioritizes speed and cost when needed. This flexibility allows businesses to optimize their AI usage and pay only for the brainpower they require.

Benchmarks demonstrate significant improvements in Gemini 2.5 Flash compared to the older Gemini 2.0 Flash model. Google has stated that the latest version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. The "thinking budget" feature offers fine-grained control over the maximum number of tokens a model can generate while thinking, ranging from 0 to 24,576 tokens. A higher budget allows the model to reason further to improve quality, but the model automatically decides how much to think based on the perceived task complexity.

Recommended read:
References :
  • venturebeat.com: Google’s new Gemini 2.5 Flash AI model introduces adjustable "thinking budgets" that let businesses pay only for the reasoning power they need, balancing advanced capabilities with cost efficiency.
  • Google DeepMind Blog: Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  • TestingCatalog: Google integrates Veo 2 AI into Gemini Advanced, enabling subscribers to create 8-second, 720p videos for TikTok and YouTube. Download MP4s with SynthID watermark.
  • Simon Willison's Weblog: Start building with Gemini 2.5 Flash
  • www.zdnet.com: Google reveals Gemini 2.5 Flash, its 'most cost-efficient thinking model'
  • developers.googleblog.com: Google's Gemini 2.5 Flash has hybrid reasoning, can be turned on or off and provides the ability for developers to set budgets to find the right trade-off between cost, quality, and latency.
  • venturebeat.com: Google’s Gemini 2.5 Flash introduces ‘thinking budgets’ that cut AI costs by 600% when turned down
  • the-decoder.com: Google’s Gemini 2.5 Flash gives you speed when you need it and reasoning when you can afford it
  • THE DECODER: Provides information about the release of Gemini 2.5 Flash, highlighting its reasoning capabilities and cost-effectiveness.
  • TestingCatalog: Google launches Gemini 2.5 Flash model with hybrid reasoning
  • bsky.app: New LLM release from Google Gemini: Gemini 2.5 Flash (preview), which lets you set a budget for how many "thinking" tokens it can use. I got it to draw me some pelicans - it has very good taste in SVG styles and comments.
  • www.marktechpost.com: Google Unveils Gemini 2.5 Flash in Preview through the Gemini API via Google AI Studio and Vertex AI.
  • LearnAI: Start building with Gemini 2.5 Flash
  • www.infoworld.com: Google previews Gemini 2.5 Flash hybrid reasoning model
  • MarkTechPost: Google Unveils Gemini 2.5 Flash in Preview through the Gemini API via Google AI Studio and Vertex AI.
  • Google DeepMind Blog: Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.
  • learn.aisingapore.org: Start building with Gemini 2.5 Flash
  • www.marketingaiinstitute.com: This blog post highlights Google Cloud Next '25 event reveals, including Gemini 2.5 Pro, AI Agents, and more.
  • bsky.app: Gemini 2.5 Pro and Flash now have the ability to return image segmentation masks on command, as base64 encoded PNGs embedded in JSON strings I vibe coded an interactive tool for exploring this new capability - it costs a fraction of a cent per image
  • Last Week in AI: Last Week in AI discussing GPT 4.1 and Gemini 2.5 Flash
  • TestingCatalog: Testing Catalog about Gemini’s Scheduled Actions may offer AI task scheduling
  • The Official Google Blog: This model allows for adjustable thinking budgets, enabling users to control costs and choose the level of reasoning needed for specific tasks.
  • simonwillison.net: The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Gemini AI Studio product lead Logan Kilpatrick :
  • Analytics Vidhya: 7 Things Gemini 2.5 Pro Does Better Than Any Other Chatbot!
  • Last Week in AI: OpenAI’s new GPT-4.1 AI models focus on coding, Google’s newest Gemini AI model focuses on efficiency, and more!
  • Simon Willison: Turns out Gemini 2.5 Flash non-thinking mode can do the same trick at an even lower cost... 0.0119 cents (around 1/100th of a cent) Notes here, including how I upgraded my tool to use the non-thinking model by vibe coding o4-mini:
  • techcrunch.com: Google’s newest Gemini AI model focuses on efficiency, and more!
  • www.analyticsvidhya.com: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
  • Digital Information World: Google launches Gemini 2.5 Flash model with hybrid reasoning, multimodal support, and cost-effective token pricing.
  • IEEE Spectrum: This article discusses the release of Google's new leading-edge LLM, Gemini 2.5 Pro, which has attracted much attention and interest.
  • www.analyticsvidhya.com: This article explores the capabilities of Gemini 2.5 Pro and compares it to other AI chatbots.
  • Analytics Vidhya: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
  • TechHQ: Google unveils “reasoning dial†for Gemini 2.5 flash: thinking vs. cost
  • techhq.com: Google unveils “reasoning dial†for Gemini 2.5 flash: thinking vs. cost
  • Last Week in AI: Last Week in AI #307 - GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2
  • Towards AI: Google's Gemini 2.5 Flash model with reasoning control allows for greater precision and control in AI applications, optimizing resources and cost.
  • www.artificialintelligence-news.com: Google's Gemini 2.5 Flash model features a "thinking budget" that allows developers to restrict processing power for problem-solving, addressing concerns about excessive resource consumption.
  • AI News: Google has introduced an AI reasoning control mechanism for its Gemini 2.5 Flash model that allows developers to limit how much processing power the system expends on problem-solving. Released on April 17, this “thinking budget†feature responds to a growing industry challenge: advanced AI models frequently overanalyse straightforward queries, consuming unnecessary computational resources and driving

Chris McKay@Maginative //
OpenAI has released its latest AI models, o3 and o4-mini, designed to enhance reasoning and tool use within ChatGPT. These models aim to provide users with smarter and faster AI experiences by leveraging web search, Python programming, visual analysis, and image generation. The models are designed to solve complex problems and perform tasks more efficiently, positioning OpenAI competitively in the rapidly evolving AI landscape. Greg Brockman from OpenAI noted the models "feel incredibly smart" and have the potential to positively impact daily life and solve challenging problems.

The o3 model stands out due to its ability to use tools independently, which enables more practical applications. The model determines when and how to utilize tools such as web search, file analysis, and image generation, thus reducing the need for users to specify tool usage with each query. The o3 model sets new standards for reasoning, particularly in coding, mathematics, and visual perception, and has achieved state-of-the-art performance on several competition benchmarks. The model excels in programming, business, consulting, and creative ideation.

Usage limits for these models vary, with o3 at 50 queries per week, and o4-mini at 150 queries per day, and o4-mini-high at 50 queries per day for Plus users, alongside 10 Deep Research queries per month. The o3 model is available to ChatGPT Pro and Team subscribers, while the o4-mini models are used across ChatGPT Plus. OpenAI says o3 is also beneficial in generating and critically evaluating novel hypotheses, especially in biology, mathematics, and engineering contexts.

Recommended read:
References :
  • Simon Willison's Weblog: OpenAI are really emphasizing tool use with these: For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems.
  • the-decoder.com: OpenAI’s new o3 and o4-mini models reason with images and tools
  • venturebeat.com: OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously
  • www.analyticsvidhya.com: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
  • www.tomsguide.com: OpenAI's o3 and o4-mini models
  • Maginative: OpenAI’s latest models—o3 and o4-mini—introduce agentic reasoning, full tool integration, and multimodal thinking, setting a new bar for AI performance in both speed and sophistication.
  • THE DECODER: OpenAI’s new o3 and o4-mini models reason with images and tools
  • Analytics Vidhya: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
  • www.zdnet.com: These new models are the first to independently use all ChatGPT tools.
  • The Tech Basic: OpenAI recently released its new AI models, o3 and o4-mini, to the public. Smart tools employ pictures to address problems through pictures, including sketch interpretation and photo restoration.
  • thetechbasic.com: OpenAI’s new AI Can “See†and Solve Problems with Pictures
  • www.marktechpost.com: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
  • MarkTechPost: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
  • analyticsindiamag.com: Access to o3 and o4-mini is rolling out today for ChatGPT Plus, Pro, and Team users.
  • THE DECODER: OpenAI is expanding its o-series with two new language models featuring improved tool usage and strong performance on complex tasks.
  • gHacks Technology News: OpenAI released its latest models, o3 and o4-mini, to enhance the performance and speed of ChatGPT in reasoning tasks.
  • www.ghacks.net: OpenAI Launches o3 and o4-Mini models to improve ChatGPT's reasoning abilities
  • Data Phoenix: OpenAI releases new reasoning models o3 and o4-mini amid intense competition. OpenAI has launched o3 and o4-mini, which combine sophisticated reasoning capabilities with comprehensive tool integration.
  • Shelly Palmer: OpenAI Quietly Reshapes the Landscape with o3 and o4-mini. OpenAI just rolled out a major update to ChatGPT, quietly releasing three new models (o3, o4-mini, and o4-mini-high) that offer the most advanced reasoning capabilities the company has ever shipped.
  • THE DECODER: Safety assessments show that OpenAI's o3 is probably the company's riskiest AI model to date
  • shellypalmer.com: OpenAI Quietly Reshapes the Landscape with o3 and o4-mini
  • BleepingComputer: OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits
  • TestingCatalog: OpenAI’s o3 and o4‑mini bring smarter tools and faster reasoning to ChatGPT
  • simonwillison.net: Introducing OpenAI o3 and o4-mini
  • bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
  • bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
  • thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. Greg Brockman (OpenAI): Just released o3 and o4-mini! These models feel incredibly smart. We’ve heard from top scientists that they produce useful novel ideas. Excited to see their …
  • thezvi.wordpress.com: OpenAI has upgraded its entire suite of models. By all reports, they are back in the game for more than images. GPT-4.1 and especially GPT-4.1-mini are their new API non-reasoning models.
  • felloai.com: OpenAI has just launched a brand-new series of GPT models—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—that promise major advances in coding, instruction following, and the ability to handle incredibly long contexts.
  • Interconnects: OpenAI's o3: Over-optimization is back and weirder than ever
  • www.ishir.com: OpenAI has released o3 and o4-mini, adding significant reasoning capabilities to its existing models. These advancements will likely transform the way users interact with AI-powered tools, making them more effective and versatile in tackling complex problems.
  • www.bigdatawire.com: OpenAI released the models o3 and o4-mini that offer advanced reasoning capabilities, integrated with tool use, like web searches and code execution.
  • Drew Breunig: OpenAI's o3 and o4-mini models offer enhanced reasoning capabilities in mathematical and coding tasks.
  • TestingCatalog: OpenAI’s o3 and o4-mini bring smarter tools and faster reasoning to ChatGPT
  • www.techradar.com: ChatGPT model matchup - I pitted OpenAI's o3, o4-mini, GPT-4o, and GPT-4.5 AI models against each other and the results surprised me
  • www.techrepublic.com: OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will get access next week.
  • Last Week in AI: OpenAI’s new GPT-4.1 AI models focus on coding, OpenAI launches a pair of AI reasoning models, o3 and o4-mini, Google’s newest Gemini AI model focuses on efficiency, and more!
  • techcrunch.com: OpenAI’s new reasoning AI models hallucinate more.
  • computational-intelligence.blogspot.com: OpenAI's new reasoning models, o3 and o4-mini, are a step up in certain capabilities compared to prior models, but their accuracy is being questioned due to increased instances of hallucinations.
  • www.unite.ai: unite.ai article discussing OpenAI's o3 and o4-mini new possibilities through multimodal reasoning and integrated toolsets.
  • Unite.AI: On April 16, 2025, OpenAI released upgraded versions of its advanced reasoning models.
  • Digital Information World: OpenAI’s Latest o3 and o4-mini AI Models Disappoint Due to More Hallucinations than Older Models
  • techcrunch.com: TechCrunch reports on OpenAI's GPT-4.1 models focusing on coding.
  • Analytics Vidhya: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
  • THE DECODER: OpenAI's o3 achieves near-perfect performance on long context benchmark.
  • the-decoder.com: OpenAI's o3 achieves near-perfect performance on long context benchmark
  • www.analyticsvidhya.com: AI models keep getting smarter, but which one truly reasons under pressure? In this blog, we put o3, o4-mini, and Gemini 2.5 Pro through a series of intense challenges: physics puzzles, math problems, coding tasks, and real-world IQ tests.
  • Simon Willison's Weblog: This post explores the use of OpenAI's o3 and o4-mini models for conversational AI, highlighting their ability to use tools in their reasoning process. It also discusses the concept of
  • Simon Willison's Weblog: The benchmark score on OpenAI's internal PersonQA benchmark (as far as I can tell no further details of that evaluation have been shared) going from 0.16 for o1 to 0.33 for o3 is interesting, but I don't know if it it's interesting enough to produce dozens of headlines along the lines of "OpenAI's o3 and o4-mini hallucinate way higher than previous models"
  • techstrong.ai: Techstrong.ai reports OpenAI o3, o4 Reasoning Models Have Some Kinks.
  • www.marktechpost.com: OpenAI Releases a Practical Guide to Identifying and Scaling AI Use Cases in Enterprise Workflows
  • Towards AI: OpenAI's o3 and o4-mini models have demonstrated promising improvements in reasoning tasks, particularly their use of tools in complex thought processes and enhanced reasoning capabilities.
  • Analytics Vidhya: In this article, we explore how OpenAI's o3 reasoning model stands out in tasks demanding analytical thinking and multi-step problem solving, showcasing its capability in accessing and processing information through tools.
  • pub.towardsai.net: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia…
  • composio.dev: OpenAI o3 vs. Gemini 2.5 Pro vs. o4-mini
  • Composio: OpenAI o3 and o4-mini are out. They are two reasoning state-of-the-art models. They’re expensive, multimodal, and super efficient at tool use.

Michael Nuñez@AI News | VentureBeat //
Anthropic has been at the forefront of investigating how AI models like Claude process information and make decisions. Their scientists developed interpretability techniques that have unveiled surprising behaviors within these systems. Research indicates that large language models (LLMs) are capable of planning ahead, as demonstrated when writing poetry or solving problems, and that they sometimes work backward from a desired conclusion rather than relying solely on provided facts.

Anthropic researchers also tested the "faithfulness" of CoT models' reasoning by giving them hints in their answers, and see if they will acknowledge it. The study found that reasoning models often avoided mentioning that they used hints in their responses. This raises concerns about the reliability of chains-of-thought (CoT) as a tool for monitoring AI systems for misaligned behaviors, especially as these models become more intelligent and integrated into society. The research emphasizes the need for ongoing efforts to enhance the transparency and trustworthiness of AI reasoning processes.

Recommended read:
References :
  • venturebeat.com: Anthropic scientists expose how AI actually ‘thinks’ — and discover it secretly plans ahead and sometimes lies
  • The Algorithmic Bridge: AI Is Learning to Reason. Humans May Be Holding It Back
  • THE DECODER: Anthropic study finds language models often hide their reasoning process
  • MarkTechPost: Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
  • MarkTechPost: This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku
  • www.marktechpost.com: This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku

Ellie Ramirez-Camara@Data Phoenix //
Google has launched Gemini 2.5 Pro, hailed as its most intelligent "thinking model" to date. This new AI model excels in reasoning and coding benchmarks, featuring an impressive 1M token context window. Gemini 2.5 Pro is currently accessible to Gemini Advanced users, with integration into Vertex AI planned for the near future. The model has already secured the top position on the Chatbot Arena LLM Leaderboard, showcasing its superior performance in areas like math, instruction following, creative writing, and handling challenging prompts.

Gemini 2.5 Pro represents a new category of "thinking models" designed to enhance performance through reasoning before responding. Google reports that it achieved this level of performance by combining an enhanced base model with improved post-training techniques and aims to build these capabilities into all of their models. The model also obtained leading scores in math and science benchmarks, including GPQA and AIME 2025, without using test-time techniques. A significant focus for the Gemini 2.5 development has been coding performance, where Google reports that the new model excels at creating visual.

Recommended read:
References :
  • Data Phoenix: Google Unveils Gemini 2.5: Its Most Intelligent AI Model Yet
  • www.csoonline.com: Google adds end-to-end email encryption to Gmail
  • GZERO Media: Meet Isomorphic Labs, the Google spinoff that aims to cure you
  • www.tomsguide.com: Google Gemini could soon help your kids with their homework — here’s what we know
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • www.techrepublic.com: Google’s Gemini 2.5 Pro is Better at Coding, Math & Science Than Your Favourite AI Model
  • TestingCatalog: Google plans new Gemini model launch ahead of Cloud Next event
  • Simon Willison's Weblog: Google's Gemini 2.5 Pro is currently the top model and, from , a superb model for OCR, audio transcription and long-context coding.
  • AI News | VentureBeat: Gemini 2.5 Pro is now available without limits and for cheaper than Claude, GPT-4o
  • eWEEK: Google has launched Gemini 2.5 Pro, its most intelligent "thinking model" to date.
  • THE DECODER: Google expands access to Gemini 2.5 Pro amid strong benchmark results
  • The Tech Basic: Google introduced its latest AI model, Gemini 2.5 Pro, in the market. The model exists specifically to perform difficult mathematical and coding operations. The system shows aptitude for solving difficult problems and logical reasoning. Many users praise the high speed and effectiveness of this model. However, the model comes with a high cost for its
  • bsky.app: Gemini 2.5 Pro pricing was announced today - it's cheaper than both GPT-4o and Claude 3.7 Sonnet I've updated my llm-gemini plugin to add support for the new paid model
  • The Cognitive Revolution: Scaling "Thinking": Gemini 2.5 Tech Lead Jack Rae on Reasoning, Long Context, & the Path to AGI
  • www.zdnet.com: Gemini Pro 2.5 is a stunningly capable coding assistant - and a big threat to ChatGPT

Matt Marshall@AI News | VentureBeat //
Microsoft is enhancing its Copilot Studio platform with AI-driven improvements, introducing deep reasoning capabilities that enable agents to tackle intricate problems through methodical thinking and combining AI flexibility with deterministic business process automation. The company has also unveiled specialized deep reasoning agents for Microsoft 365 Copilot, named Researcher and Analyst, to help users achieve tasks more efficiently. These agents are designed to function like personal data scientists, processing diverse data sources and generating insights through code execution and visualization.

Microsoft's focus includes securing AI and using it to bolster security measures, as demonstrated by the upcoming Microsoft Security Copilot agents and new security features. Microsoft aims to provide an AI-first, end-to-end security platform that helps organizations secure their future, one example being the AI agents designed to autonomously assist with phishing, data security, and identity management. The Security Copilot tool will automate routine tasks, allowing IT and security staff to focus on more complex issues, aiding in defense against cyberattacks.

Recommended read:
References :
  • Microsoft Security Blog: Learn about the upcoming availability of Microsoft Security Copilot agents and other new offerings for a more secure AI future.
  • www.zdnet.com: Designed for Microsoft's Security Copilot tool, the AI-powered agents will automate basic tasks, freeing IT and security staff to tackle more complex issues.

Maximilian Schreiner@THE DECODER //
Google DeepMind has announced Gemini 2.5 Pro, its latest and most advanced AI model to date. This new model boasts enhanced reasoning capabilities and improved accuracy, marking a significant step forward in AI development. Gemini 2.5 Pro is designed with built-in 'thinking' capabilities, enabling it to break down complex tasks into multiple steps and analyze information more effectively before generating a response. This allows the AI to deduce logical conclusions, incorporate contextual nuances, and make informed decisions with unprecedented accuracy, according to Google.

The Gemini 2.5 Pro has already secured the top position on the LMArena leaderboard, surpassing other AI models in head-to-head comparisons. This achievement highlights its superior performance and high-quality style in handling intricate tasks. The model also leads in math and science benchmarks, demonstrating its advanced reasoning capabilities across various domains. This new model is available as Gemini 2.5 Pro (experimental) on Google’s AI Studio and for Gemini Advanced users on the Gemini chat interface.

Recommended read:
References :
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent AI model
  • Shelly Palmer: Google’s Gemini 2.5: AI That Thinks Before It Speaks
  • AI News: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date
  • Interconnects: Gemini 2.5 Pro and Google's second chance with AI
  • SiliconANGLE: Google introduces Gemini 2.5 Pro with chain-of-thought reasoning built-in
  • AI News | VentureBeat: Google releases ‘most intelligent model to date,’ Gemini 2.5 Pro
  • Analytics Vidhya: Gemini 2.5 Pro is Now #1 on Chatbot Arena with Impressive Jump
  • www.tomsguide.com: Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
  • Fello AI: Google’s Gemini 2.5 Shocks the World: Crushing AI Benchmark Like No Other AI Model!
  • bdtechtalks.com: What to know about Google Gemini 2.5 Pro
  • TestingCatalog: Gemini 2.5 Pro sets new AI benchmark and launches on AI Studio and Gemini
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • thezvi.wordpress.com: Gemini 2.5 is the New SoTA
  • www.infoworld.com: Google has introduced version 2.5 of its , which the company said offers a new level of performance by combining an enhanced base model with improved post-training.
  • Composio: Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison
  • Composio: Google dropped its best-ever creation, Gemini 2.5 Pro Experimental, on March 25. It is a stupidly incredible reasoning model shining on every The post first appeared on.
  • www.tomsguide.com: Gemini 2.5 Pro is now free to all users in surprise move
  • Analytics India Magazine: Did Google Just Build The Best AI Model for Coding?
  • www.zdnet.com: Everyone can now try Gemini 2.5 Pro - for free

Matt Marshall@AI News | VentureBeat //
Microsoft is introducing two new AI reasoning agents, Researcher and Analyst, to Microsoft 365 Copilot. These "first-of-their-kind" agents are designed to tackle complex problems by utilizing methodical thinking, offering users a more efficient workflow. The Researcher agent combines OpenAI’s deep research model with Microsoft 365 Copilot’s advanced orchestration and search capabilities to deliver insights with greater quality and accuracy, helping users perform complex, multi-step research.

Analyst, built on OpenAI’s o3-mini reasoning model, functions like a skilled data scientist and is optimized for advanced data analysis. It uses chain-of-thought reasoning to iteratively refine its analysis and provide high-quality answers, mirroring human analytical thinking. This agent can run Python to tackle complex data queries and can turn raw data scattered across spreadsheets into visualizations or revenue projections. Researcher and Analyst will be available to customers with a Microsoft 365 Copilot license in April as part of a new “Frontier” program.

Recommended read:
References :
  • AI News | VentureBeat: Microsoft announced Tuesday two significant additions to its Copilot Studio platform: deep reasoning capabilities that enable agents to tackle complex problems through careful, methodical thinking, and agent flows that combine AI flexibility with deterministic business process automation.
  • Source Asia: Introducing two, first-of-their-kind reasoning agents in Microsoft 365 Copilot.
  • www.zdnet.com: Microsoft releases its answer to OpenAI and Google's Deep Research.

Maximilian Schreiner@THE DECODER //
Google has unveiled Gemini 2.5 Pro, its latest and "most intelligent" AI model to date, showcasing significant advancements in reasoning, coding proficiency, and multimodal functionalities. According to Google, these improvements come from combining a significantly enhanced base model with improved post-training techniques. The model is designed to analyze complex information, incorporate contextual nuances, and draw logical conclusions with unprecedented accuracy. Gemini 2.5 Pro is now available for Gemini Advanced users and on Google's AI Studio.

Google emphasizes the model's "thinking" capabilities, achieved through chain-of-thought reasoning, which allows it to break down complex tasks into multiple steps and reason through them before responding. This new model can handle multimodal input from text, audio, images, videos, and large datasets. Additionally, Gemini 2.5 Pro exhibits strong performance in coding tasks, surpassing Gemini 2.0 in specific benchmarks and excelling at creating visually compelling web apps and agentic code applications. The model also achieved 18.8% on Humanity’s Last Exam, demonstrating its ability to handle complex knowledge-based questions.

Recommended read:
References :
  • SiliconANGLE: Google LLC said today it’s updating its flagship Gemini artificial intelligence model family by introducing an experimental Gemini 2.5 Pro version.
  • The Tech Basic: Google's New AI Models “Think” Before Answering, Outperform Rivals
  • AI News | VentureBeat: Google releases ‘most intelligent model to date,’ Gemini 2.5 Pro
  • Analytics Vidhya: We Tried the Google 2.5 Pro Experimental Model and It’s Mind-Blowing!
  • www.tomsguide.com: Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent AI model
  • THE DECODER: Google Deepmind has introduced Gemini 2.5 Pro, which the company describes as its most capable AI model to date. The article appeared first on .
  • intelligence-artificielle.developpez.com: Google DeepMind a lancé Gemini 2.5 Pro, un modèle d'IA qui raisonne avant de répondre, affirmant qu'il est le meilleur sur plusieurs critères de référence en matière de raisonnement et de codage
  • The Tech Portal: Google unveils Gemini 2.5, its most intelligent AI model yet with ‘built-in thinking’
  • Ars OpenForum: Google says the new Gemini 2.5 Pro model is its “smartest†AI yet
  • The Official Google Blog: Gemini 2.5: Our most intelligent AI model
  • www.techradar.com: I pitted Gemini 2.5 Pro against ChatGPT o3-mini to find out which AI reasoning model is best
  • bsky.app: Google's AI comeback is official. Gemini 2.5 Pro Experimental leads in benchmarks for coding, math, science, writing, instruction following, and more, ahead of OpenAI's o3-mini, OpenAI's GPT-4.5, Anthropic's Claude 3.7, xAI's Grok 3, and DeepSeek's R1. The narrative has finally shifted.
  • Shelly Palmer: Google’s Gemini 2.5: AI That Thinks Before It Speaks
  • bdtechtalks.com: Gemini 2.5 Pro is a new reasoning model that excels in long-context tasks and benchmarks, revitalizing Google’s AI strategy against competitors like OpenAI.
  • Interconnects: The end of a busy spring of model improvements and what's next for the presumed leader in AI abilities.
  • www.techradar.com: Gemini 2.5 is now available for Advanced users and it seriously improves Google’s AI reasoning
  • www.zdnet.com: Google releases 'most intelligent' experimental Gemini 2.5 Pro - here's how to try it
  • Unite.AI: Gemini 2.5 Pro is Here—And it Changes the AI Game (Again)
  • TestingCatalog: Gemini 2.5 Pro sets new AI benchmark and launches on AI Studio and Gemini
  • Analytics Vidhya: Google DeepMind's latest AI model, Gemini 2.5 Pro, has reached the #1 position on the Arena leaderboard.
  • AI News: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date
  • Fello AI: Google’s Gemini 2.5 Shocks the World: Crushing AI Benchmark Like No Other AI Model!
  • Analytics India Magazine: Google Unveils Gemini 2.5, Crushes OpenAI GPT-4.5, DeepSeek R1, & Claude 3.7 Sonnet
  • Practical Technology: Practical Tech covers the launch of Google's Gemini 2.5 Pro and its new AI benchmark achievements.
  • Shelly Palmer: Google's Gemini 2.5: AI That Thinks Before It Speaks
  • www.producthunt.com: Google's most intelligent AI model
  • Windows Copilot News: Google reveals AI ‘reasoning’ model that ‘explicitly shows its thoughts’
  • AI News | VentureBeat: Hands on with Gemini 2.5 Pro: why it might be the most useful reasoning model yet
  • thezvi.wordpress.com: Gemini 2.5 Pro Experimental is America’s next top large language model. That doesn’t mean it is the best model for everything. In particular, it’s still Gemini, so it still is a proud member of the Fun Police, in terms of …
  • www.computerworld.com: Gemini 2.5 can, among other things, analyze information, draw logical conclusions, take context into account, and make informed decisions.
  • www.infoworld.com: Google introduces Gemini 2.5 reasoning models
  • Maginative: Google's Gemini 2.5 Pro leads AI benchmarks with enhanced reasoning capabilities, positioning it ahead of competing models from OpenAI and others.
  • www.infoq.com: Google's Gemini 2.5 Pro is a powerful new AI model that's quickly becoming a favorite among developers and researchers. It's capable of advanced reasoning and excels in complex tasks.
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • Communications of the ACM: Google has released Gemini 2.5 Pro, an updated AI model focused on enhanced reasoning, code generation, and multimodal processing.
  • The Next Web: Google has released Gemini 2.5 Pro, an updated AI model focused on enhanced reasoning, code generation, and multimodal processing.
  • www.tomsguide.com: Gemini 2.5 Pro is now free to all users in surprise move
  • Composio: Google just launched Gemini 2.5 Pro on March 26th, claiming to be the best in coding, reasoning and overall everything. But I The post appeared first on .
  • Composio: Google's Gemini 2.5 Pro, released on March 26th, is being hailed for its enhanced reasoning, coding, and multimodal capabilities.
  • Analytics India Magazine: Gemini 2.5 Pro is better than the Claude 3.7 Sonnet for coding in the Aider Polyglot leaderboard.
  • www.zdnet.com: Gemini's latest model outperforms OpenAI's o3 mini and Anthropic's Claude 3.7 Sonnet on the latest benchmarks. Here's how to try it.
  • www.marketingaiinstitute.com: [The AI Show Episode 142]: ChatGPT’s New Image Generator, Studio Ghibli Craze and Backlash, Gemini 2.5, OpenAI Academy, 4o Updates, Vibe Marketing & xAI Acquires X
  • www.tomsguide.com: Gemini 2.5 is free, but can it beat DeepSeek?
  • www.tomsguide.com: Google Gemini could soon help your kids with their homework — here’s what we know
  • PCWorld: Google’s latest Gemini 2.5 Pro AI model is now free for all users
  • www.techradar.com: Google just made Gemini 2.5 Pro Experimental free for everyone, and that's awesome.
  • Last Week in AI: #205 - Gemini 2.5, ChatGPT Image Gen, Thoughts of LLMs