News from the AI & ML world

DeeperML - #reasoning

Michael Nuñez@AI News | VentureBeat //
Anthropic has been at the forefront of investigating how AI models like Claude process information and make decisions. Their scientists developed interpretability techniques that have unveiled surprising behaviors within these systems. Research indicates that large language models (LLMs) are capable of planning ahead, as demonstrated when writing poetry or solving problems, and that they sometimes work backward from a desired conclusion rather than relying solely on provided facts.

Anthropic researchers also tested the "faithfulness" of CoT models' reasoning by giving them hints in their answers, and see if they will acknowledge it. The study found that reasoning models often avoided mentioning that they used hints in their responses. This raises concerns about the reliability of chains-of-thought (CoT) as a tool for monitoring AI systems for misaligned behaviors, especially as these models become more intelligent and integrated into society. The research emphasizes the need for ongoing efforts to enhance the transparency and trustworthiness of AI reasoning processes.

Recommended read:
References :
  • venturebeat.com: Anthropic scientists expose how AI actually ‘thinks’ — and discover it secretly plans ahead and sometimes lies
  • The Algorithmic Bridge: AI Is Learning to Reason. Humans May Be Holding It Back
  • THE DECODER: Anthropic study finds language models often hide their reasoning process

Ellie Ramirez-Camara@Data Phoenix //
Google has launched Gemini 2.5 Pro, hailed as its most intelligent "thinking model" to date. This new AI model excels in reasoning and coding benchmarks, featuring an impressive 1M token context window. Gemini 2.5 Pro is currently accessible to Gemini Advanced users, with integration into Vertex AI planned for the near future. The model has already secured the top position on the Chatbot Arena LLM Leaderboard, showcasing its superior performance in areas like math, instruction following, creative writing, and handling challenging prompts.

Gemini 2.5 Pro represents a new category of "thinking models" designed to enhance performance through reasoning before responding. Google reports that it achieved this level of performance by combining an enhanced base model with improved post-training techniques and aims to build these capabilities into all of their models. The model also obtained leading scores in math and science benchmarks, including GPQA and AIME 2025, without using test-time techniques. A significant focus for the Gemini 2.5 development has been coding performance, where Google reports that the new model excels at creating visual.

Recommended read:
References :
  • Data Phoenix: Google Unveils Gemini 2.5: Its Most Intelligent AI Model Yet
  • www.csoonline.com: Google adds end-to-end email encryption to Gmail
  • GZERO Media: Meet Isomorphic Labs, the Google spinoff that aims to cure you
  • www.tomsguide.com: Google Gemini could soon help your kids with their homework — here’s what we know
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • www.techrepublic.com: Google’s Gemini 2.5 Pro is Better at Coding, Math & Science Than Your Favourite AI Model
  • TestingCatalog: Google plans new Gemini model launch ahead of Cloud Next event
  • Simon Willison's Weblog: Google's Gemini 2.5 Pro is currently the top model and, from , a superb model for OCR, audio transcription and long-context coding.
  • AI News | VentureBeat: Gemini 2.5 Pro is now available without limits and for cheaper than Claude, GPT-4o
  • eWEEK: Google has launched Gemini 2.5 Pro, its most intelligent "thinking model" to date.
  • THE DECODER: Google expands access to Gemini 2.5 Pro amid strong benchmark results
  • The Tech Basic: Google introduced its latest AI model, Gemini 2.5 Pro, in the market. The model exists specifically to perform difficult mathematical and coding operations. The system shows aptitude for solving difficult problems and logical reasoning. Many users praise the high speed and effectiveness of this model. However, the model comes with a high cost for its
  • bsky.app: Gemini 2.5 Pro pricing was announced today - it's cheaper than both GPT-4o and Claude 3.7 Sonnet I've updated my llm-gemini plugin to add support for the new paid model
  • The Cognitive Revolution: Scaling "Thinking": Gemini 2.5 Tech Lead Jack Rae on Reasoning, Long Context, & the Path to AGI
  • www.zdnet.com: Gemini Pro 2.5 is a stunningly capable coding assistant - and a big threat to ChatGPT

Matt Marshall@AI News | VentureBeat //
Microsoft is enhancing its Copilot Studio platform with AI-driven improvements, introducing deep reasoning capabilities that enable agents to tackle intricate problems through methodical thinking and combining AI flexibility with deterministic business process automation. The company has also unveiled specialized deep reasoning agents for Microsoft 365 Copilot, named Researcher and Analyst, to help users achieve tasks more efficiently. These agents are designed to function like personal data scientists, processing diverse data sources and generating insights through code execution and visualization.

Microsoft's focus includes securing AI and using it to bolster security measures, as demonstrated by the upcoming Microsoft Security Copilot agents and new security features. Microsoft aims to provide an AI-first, end-to-end security platform that helps organizations secure their future, one example being the AI agents designed to autonomously assist with phishing, data security, and identity management. The Security Copilot tool will automate routine tasks, allowing IT and security staff to focus on more complex issues, aiding in defense against cyberattacks.

Recommended read:
References :
  • Microsoft Security Blog: Learn about the upcoming availability of Microsoft Security Copilot agents and other new offerings for a more secure AI future.
  • www.zdnet.com: Designed for Microsoft's Security Copilot tool, the AI-powered agents will automate basic tasks, freeing IT and security staff to tackle more complex issues.

Maximilian Schreiner@THE DECODER //
Google DeepMind has announced Gemini 2.5 Pro, its latest and most advanced AI model to date. This new model boasts enhanced reasoning capabilities and improved accuracy, marking a significant step forward in AI development. Gemini 2.5 Pro is designed with built-in 'thinking' capabilities, enabling it to break down complex tasks into multiple steps and analyze information more effectively before generating a response. This allows the AI to deduce logical conclusions, incorporate contextual nuances, and make informed decisions with unprecedented accuracy, according to Google.

The Gemini 2.5 Pro has already secured the top position on the LMArena leaderboard, surpassing other AI models in head-to-head comparisons. This achievement highlights its superior performance and high-quality style in handling intricate tasks. The model also leads in math and science benchmarks, demonstrating its advanced reasoning capabilities across various domains. This new model is available as Gemini 2.5 Pro (experimental) on Google’s AI Studio and for Gemini Advanced users on the Gemini chat interface.

Recommended read:
References :
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent AI model
  • Shelly Palmer: Google’s Gemini 2.5: AI That Thinks Before It Speaks
  • AI News: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date
  • Interconnects: Gemini 2.5 Pro and Google's second chance with AI
  • SiliconANGLE: Google introduces Gemini 2.5 Pro with chain-of-thought reasoning built-in
  • AI News | VentureBeat: Google releases ‘most intelligent model to date,’ Gemini 2.5 Pro
  • Analytics Vidhya: Gemini 2.5 Pro is Now #1 on Chatbot Arena with Impressive Jump
  • www.tomsguide.com: Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
  • Fello AI: Google’s Gemini 2.5 Shocks the World: Crushing AI Benchmark Like No Other AI Model!
  • bdtechtalks.com: What to know about Google Gemini 2.5 Pro
  • TestingCatalog: Gemini 2.5 Pro sets new AI benchmark and launches on AI Studio and Gemini
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • thezvi.wordpress.com: Gemini 2.5 is the New SoTA
  • www.infoworld.com: Google has introduced version 2.5 of its , which the company said offers a new level of performance by combining an enhanced base model with improved post-training.
  • Composio: Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison
  • Composio: Google dropped its best-ever creation, Gemini 2.5 Pro Experimental, on March 25. It is a stupidly incredible reasoning model shining on every The post first appeared on.
  • www.tomsguide.com: Gemini 2.5 Pro is now free to all users in surprise move
  • Analytics India Magazine: Did Google Just Build The Best AI Model for Coding?
  • www.zdnet.com: Everyone can now try Gemini 2.5 Pro - for free

Matt Marshall@AI News | VentureBeat //
Microsoft is introducing two new AI reasoning agents, Researcher and Analyst, to Microsoft 365 Copilot. These "first-of-their-kind" agents are designed to tackle complex problems by utilizing methodical thinking, offering users a more efficient workflow. The Researcher agent combines OpenAI’s deep research model with Microsoft 365 Copilot’s advanced orchestration and search capabilities to deliver insights with greater quality and accuracy, helping users perform complex, multi-step research.

Analyst, built on OpenAI’s o3-mini reasoning model, functions like a skilled data scientist and is optimized for advanced data analysis. It uses chain-of-thought reasoning to iteratively refine its analysis and provide high-quality answers, mirroring human analytical thinking. This agent can run Python to tackle complex data queries and can turn raw data scattered across spreadsheets into visualizations or revenue projections. Researcher and Analyst will be available to customers with a Microsoft 365 Copilot license in April as part of a new “Frontier” program.

Recommended read:
References :
  • AI News | VentureBeat: Microsoft announced Tuesday two significant additions to its Copilot Studio platform: deep reasoning capabilities that enable agents to tackle complex problems through careful, methodical thinking, and agent flows that combine AI flexibility with deterministic business process automation.
  • Source Asia: Introducing two, first-of-their-kind reasoning agents in Microsoft 365 Copilot.
  • www.zdnet.com: Microsoft releases its answer to OpenAI and Google's Deep Research.

Maximilian Schreiner@THE DECODER //
Google has unveiled Gemini 2.5 Pro, its latest and "most intelligent" AI model to date, showcasing significant advancements in reasoning, coding proficiency, and multimodal functionalities. According to Google, these improvements come from combining a significantly enhanced base model with improved post-training techniques. The model is designed to analyze complex information, incorporate contextual nuances, and draw logical conclusions with unprecedented accuracy. Gemini 2.5 Pro is now available for Gemini Advanced users and on Google's AI Studio.

Google emphasizes the model's "thinking" capabilities, achieved through chain-of-thought reasoning, which allows it to break down complex tasks into multiple steps and reason through them before responding. This new model can handle multimodal input from text, audio, images, videos, and large datasets. Additionally, Gemini 2.5 Pro exhibits strong performance in coding tasks, surpassing Gemini 2.0 in specific benchmarks and excelling at creating visually compelling web apps and agentic code applications. The model also achieved 18.8% on Humanity’s Last Exam, demonstrating its ability to handle complex knowledge-based questions.

Recommended read:
References :
  • SiliconANGLE: Google LLC said today it’s updating its flagship Gemini artificial intelligence model family by introducing an experimental Gemini 2.5 Pro version.
  • The Tech Basic: Google's New AI Models “Think” Before Answering, Outperform Rivals
  • AI News | VentureBeat: Google releases ‘most intelligent model to date,’ Gemini 2.5 Pro
  • Analytics Vidhya: We Tried the Google 2.5 Pro Experimental Model and It’s Mind-Blowing!
  • www.tomsguide.com: Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
  • Google DeepMind Blog: Gemini 2.5: Our most intelligent AI model
  • THE DECODER: Google Deepmind has introduced Gemini 2.5 Pro, which the company describes as its most capable AI model to date. The article appeared first on .
  • intelligence-artificielle.developpez.com: Google DeepMind a lancé Gemini 2.5 Pro, un modèle d'IA qui raisonne avant de répondre, affirmant qu'il est le meilleur sur plusieurs critères de référence en matière de raisonnement et de codage
  • The Tech Portal: Google unveils Gemini 2.5, its most intelligent AI model yet with ‘built-in thinking’
  • Ars OpenForum: Google says the new Gemini 2.5 Pro model is its “smartest†AI yet
  • The Official Google Blog: Gemini 2.5: Our most intelligent AI model
  • www.techradar.com: I pitted Gemini 2.5 Pro against ChatGPT o3-mini to find out which AI reasoning model is best
  • bsky.app: Google's AI comeback is official. Gemini 2.5 Pro Experimental leads in benchmarks for coding, math, science, writing, instruction following, and more, ahead of OpenAI's o3-mini, OpenAI's GPT-4.5, Anthropic's Claude 3.7, xAI's Grok 3, and DeepSeek's R1. The narrative has finally shifted.
  • Shelly Palmer: Google’s Gemini 2.5: AI That Thinks Before It Speaks
  • bdtechtalks.com: Gemini 2.5 Pro is a new reasoning model that excels in long-context tasks and benchmarks, revitalizing Google’s AI strategy against competitors like OpenAI.
  • Interconnects: The end of a busy spring of model improvements and what's next for the presumed leader in AI abilities.
  • www.techradar.com: Gemini 2.5 is now available for Advanced users and it seriously improves Google’s AI reasoning
  • www.zdnet.com: Google releases 'most intelligent' experimental Gemini 2.5 Pro - here's how to try it
  • Unite.AI: Gemini 2.5 Pro is Here—And it Changes the AI Game (Again)
  • TestingCatalog: Gemini 2.5 Pro sets new AI benchmark and launches on AI Studio and Gemini
  • Analytics Vidhya: Google DeepMind's latest AI model, Gemini 2.5 Pro, has reached the #1 position on the Arena leaderboard.
  • AI News: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date
  • Fello AI: Google’s Gemini 2.5 Shocks the World: Crushing AI Benchmark Like No Other AI Model!
  • Analytics India Magazine: Google Unveils Gemini 2.5, Crushes OpenAI GPT-4.5, DeepSeek R1, & Claude 3.7 Sonnet
  • Practical Technology: Practical Tech covers the launch of Google's Gemini 2.5 Pro and its new AI benchmark achievements.
  • Shelly Palmer: Google's Gemini 2.5: AI That Thinks Before It Speaks
  • www.producthunt.com: Google's most intelligent AI model
  • Windows Copilot News: Google reveals AI ‘reasoning’ model that ‘explicitly shows its thoughts’
  • AI News | VentureBeat: Hands on with Gemini 2.5 Pro: why it might be the most useful reasoning model yet
  • thezvi.wordpress.com: Gemini 2.5 Pro Experimental is America’s next top large language model. That doesn’t mean it is the best model for everything. In particular, it’s still Gemini, so it still is a proud member of the Fun Police, in terms of …
  • www.computerworld.com: Gemini 2.5 can, among other things, analyze information, draw logical conclusions, take context into account, and make informed decisions.
  • www.infoworld.com: Google introduces Gemini 2.5 reasoning models
  • Maginative: Google's Gemini 2.5 Pro leads AI benchmarks with enhanced reasoning capabilities, positioning it ahead of competing models from OpenAI and others.
  • www.infoq.com: Google's Gemini 2.5 Pro is a powerful new AI model that's quickly becoming a favorite among developers and researchers. It's capable of advanced reasoning and excels in complex tasks.
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using – and 4 reasons it matters for enterprise AI
  • Communications of the ACM: Google has released Gemini 2.5 Pro, an updated AI model focused on enhanced reasoning, code generation, and multimodal processing.
  • The Next Web: Google has released Gemini 2.5 Pro, an updated AI model focused on enhanced reasoning, code generation, and multimodal processing.
  • www.tomsguide.com: Gemini 2.5 Pro is now free to all users in surprise move
  • Composio: Google just launched Gemini 2.5 Pro on March 26th, claiming to be the best in coding, reasoning and overall everything. But I The post appeared first on .
  • Composio: Google's Gemini 2.5 Pro, released on March 26th, is being hailed for its enhanced reasoning, coding, and multimodal capabilities.
  • Analytics India Magazine: Gemini 2.5 Pro is better than the Claude 3.7 Sonnet for coding in the Aider Polyglot leaderboard.
  • www.zdnet.com: Gemini's latest model outperforms OpenAI's o3 mini and Anthropic's Claude 3.7 Sonnet on the latest benchmarks. Here's how to try it.
  • www.marketingaiinstitute.com: [The AI Show Episode 142]: ChatGPT’s New Image Generator, Studio Ghibli Craze and Backlash, Gemini 2.5, OpenAI Academy, 4o Updates, Vibe Marketing & xAI Acquires X
  • www.tomsguide.com: Gemini 2.5 is free, but can it beat DeepSeek?
  • www.tomsguide.com: Google Gemini could soon help your kids with their homework — here’s what we know
  • PCWorld: Google’s latest Gemini 2.5 Pro AI model is now free for all users
  • www.techradar.com: Google just made Gemini 2.5 Pro Experimental free for everyone, and that's awesome.
  • Last Week in AI: #205 - Gemini 2.5, ChatGPT Image Gen, Thoughts of LLMs
  • Data Phoenix: Google Unveils Gemini 2.5: Its Most Intelligent AI Model Yet
  • SiliconANGLE: AWS brings its generative AI assistant to the Amazon OpenSearch Service

Ben Lorica@Gradient Flow //
AI is rapidly evolving, moving beyond simple predictions to tackle complex, structured problems. New AI agents, open models, and a focus on productivity are driving this change, with reasoning and deep research capabilities leading the way. China's Monica.ai has developed Manus, an AI agent that autonomously handles tasks like real estate research and resume analysis, demonstrating the potential of multi-agent systems.

OpenAI is also advancing agent-building tools, including the Responses API and Computer Use Tool, enabling developers to create sophisticated applications. Additionally, Google's Gemma 3 offers an open-weight language model family, providing developers with greater flexibility. These advancements signify a shift towards artificial general intelligence (AGI), where AI can genuinely reason and solve novel problems.

AI's impact is also evident in content creation, with platforms like n8n enabling users to build content creator agents without coding. These agents can automate tasks and streamline workflows, boosting productivity for individuals and businesses. This reflects a broader trend of AI being used to enhance productivity across various sectors, from research to code generation and content creation.

Recommended read:
References :
  • Gradient Flow: Discusses new AI agents, open models, and the race for productivity.
  • AI News | VentureBeat: Details how reasoning and deep research are expanding AI from statistical prediction to structured problem-solving.

Matthias Bastian@THE DECODER //
Baidu has launched two new AI models, ERNIE 4.5 and ERNIE X1, designed to compete with DeepSeek's R1 model. The company is making these models freely accessible to individual users through the ERNIE Bot platform, ahead of the initially planned schedule. ERNIE 4.5 is a multimodal foundation model, integrating text, images, audio, and video to enhance understanding and content generation across various data types. This model demonstrates significant improvements in language understanding, reasoning, and coding abilities.

ERNIE X1 is Baidu's first model specifically designed for complex reasoning tasks, excelling in logical inference, problem-solving, and structured decision-making suitable for applications in finance, law, and data analysis. Baidu claims that ERNIE X1 matches DeepSeek R1’s performance at half the cost. ERNIE 4.5 has shown performance on par with models like DeepSeek-R1, but at approximately half the deployment cost.

Recommended read:
References :
  • AiThority: With the launch of ERNIE 4.5 and ERNIE X1, ERNIE Bot is made free to the public ahead of schedule, and users can access both models free of charge.
  • techxplore.com: Chinese internet search giant Baidu released a new artificial intelligence reasoning model Sunday and made its AI chatbot services free to consumers as ferocious competition grips the sector.
  • THE DECODER: Baidu claims its Ernie X1 reasoning model matches Deepseek-R1 performance at half the price
  • Analytics Vidhya: China has done it again with its AI models and this time the blow is bigger and better! Baidu – a Chinese AI company, recently released two large language models (LLMs) – ERNIE 4.5 & X1.
  • TestingCatalog: Discover Baidu's new AI models, ERNIE 4.5 and ERNIE X1, now freely accessible via ERNIE Bot. Experience cutting-edge AI tech ahead of schedule!
  • Analytics India Magazine: China’s Baidu Launches Two New AI Models, Rivals DeepSeek R1 at Half the Price
  • TechCrunch: Chinese search engine Baidu has launched two new AI models — Ernie 4.5, the latest version of the company’s foundational model first released two years ago, as well as a new reasoning model, Ernie X1. According to Reuters, Baidu claims that Ernie X1’s performance is “on par with DeepSeek R1 at only
  • AIwire: With the launch of ERNIE 4.5 and ERNIE X1, ERNIE Bot is made free to the public ahead of schedule, and users can access both models free of charge.
  • AI News: Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1
  • AI News | VentureBeat: Baidu has also announced plans to integrate ERNIE 4.5 and ERNIE X1 into its broader ecosystem, including Baidu Search and the Wenxiaoyan app.
  • www.tomshardware.com: ERNIE 4.5 AI model by Baidu claims to match DeepSeek R1 at half the cost
  • Fello AI: Baidu’s New ERNIE 4.5 & X1 – A Free AI That Is Better Than GPT-4.5 & Costs Pennies!

Matthew S.@IEEE Spectrum //
Recent research indicates that AI models, particularly large language models (LLMs), can struggle with overthinking and analysis paralysis, impacting their efficiency and success rates. A study has found that reasoning LLMs sometimes overthink problems, which leads to increased computational costs and a reduction in their overall performance. This issue is being addressed through various optimization techniques, including scaling inference-time compute, reinforcement learning, and supervised fine-tuning, to ensure models use only the necessary amount of reasoning for tasks.

The size and training methods of these models play a crucial role in their reasoning abilities. For instance, Alibaba's Qwen team introduced QwQ-32B, a 32-billion-parameter model that outperforms much larger rivals in key problem-solving tasks. QwQ-32B achieves superior performance in math, coding, and scientific reasoning using multi-stage reinforcement learning, despite being significantly smaller than DeepSeek-R1. This advancement highlights the potential of reinforcement learning to unlock reasoning capabilities in smaller models, rivaling the performance of giant models while requiring less computational power.

Recommended read:
References :
  • IEEE Spectrum: It’s Not Just Us: AI Models Struggle With Overthinking
  • Sebastian Raschka, PhD: This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.

@bdtechtalks.com //
Alibaba has recently launched Qwen-32B, a new reasoning model, which demonstrates performance levels on par with DeepMind's R1 model. This development signifies a notable achievement in the field of AI, particularly for smaller models. The Qwen team showcased that reinforcement learning on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.

Qwen-32B not only matches but also surpasses models like DeepSeek-R1 and OpenAI's o1-mini across key industry benchmarks, including AIME24, LiveBench, and BFCL. This is significant because Qwen-32B achieves this level of performance with only approximately 5% of the parameters used by DeepSeek-R1, resulting in lower inference costs without compromising on quality or capability. Groq is offering developers the ability to build FAST with Qwen QwQ 32B on GroqCloud™, running the 32B parameter model at ~400 T/s. This model is proving to be very competitive in reasoning benchmarks and is one of the top open source models being used.

The Qwen-32B model was explicitly designed for tool use and adapting its reasoning based on environmental feedback, which is a huge win for AI agents that need to reason, plan, and adapt based on context (outperforms R1 and o1-mini on the Berkeley Function Calling Leaderboard). With these capabilities, Qwen-32B shows that RL on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.

Recommended read:
References :
  • Last Week in AI: LWiAI Podcast #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors
  • Groq: A Guide to Reasoning with Qwen QwQ 32B
  • Last Week in AI: #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors
  • Sebastian Raschka, PhD: This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.
  • Analytics Vidhya: China is rapidly advancing in AI, releasing models like DeepSeek and Qwen to rival global giants.
  • Last Week in AI: Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1
  • Maginative: Despite having far fewer parameters, Qwen’s new QwQ-32B model outperforms DeepSeek-R1 and OpenAI’s o1-mini in mathematical benchmarks and scientific reasoning, showcasing the power of reinforcement learning.

@bdtechtalks.com //
References: Groq , Analytics Vidhya , bdtechtalks.com ...
Alibaba's Qwen team has unveiled QwQ-32B, a 32-billion-parameter reasoning model that rivals much larger AI models in problem-solving capabilities. This development highlights the potential of reinforcement learning (RL) in enhancing AI performance. QwQ-32B excels in mathematics, coding, and scientific reasoning tasks, outperforming models like DeepSeek-R1 (671B parameters) and OpenAI's o1-mini, despite its significantly smaller size. Its effectiveness lies in a multi-stage RL training approach, demonstrating the ability of smaller models with scaled reinforcement learning to match or surpass the performance of giant models.

The QwQ-32B is not only competitive in performance but also offers practical advantages. It is available as open-weight under an Apache 2.0 license, allowing businesses to customize and deploy it without restrictions. Additionally, QwQ-32B requires significantly less computational power, running on a single high-end GPU compared to the multi-GPU setups needed for larger models like DeepSeek-R1. This combination of performance, accessibility, and efficiency positions QwQ-32B as a valuable resource for the AI community and enterprises seeking to leverage advanced reasoning capabilities.

Recommended read:
References :
  • Groq: A Guide to Reasoning with Qwen QwQ 32B
  • Analytics Vidhya: Qwen’s QwQ-32B: Small Model with Huge Potential
  • Maginative: Alibaba's Latest AI Model, QwQ-32B, Beats Larger Rivals in Math and Reasoning
  • bdtechtalks.com: Alibaba’s QwQ-32B reasoning model matches DeepSeek-R1, outperforms OpenAI o1-mini
  • Last Week in AI: LWiAI Podcast #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

Ryan Daws@AI News //
Alibaba's Qwen team has launched QwQ-32B, a 32-billion parameter AI model, designed to rival the performance of much larger models like DeepSeek-R1, which has 671 billion parameters. This new model highlights the effectiveness of scaling Reinforcement Learning (RL) on robust foundation models. QwQ-32B leverages continuous RL scaling to demonstrate significant improvements in areas like mathematical reasoning and coding proficiency.

The Qwen team successfully integrated agent capabilities into the reasoning model, allowing it to think critically, use tools, and adapt its reasoning based on environmental feedback. The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities. QwQ-32B is available as open-weight on Hugging Face and on ModelScope under an Apache 2.0 license, allowing for both commercial and research uses.

Recommended read:
References :
  • AI News | VentureBeat: Alibaba's new open source model QwQ-32B matches DeepSeek-R1 with way smaller compute requirements
  • Analytics Vidhya: In the world of large language models (LLMs) there is an assumption that larger models inherently perform better. Qwen has recently introduced its latest model, QwQ-32B, positioning it as a direct competitor to the massive DeepSeek-R1 despite having significantly fewer parameters.
  • AI News: The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.
  • www.infoworld.com: Alibaba Cloud on Thursday launched QwQ-32B, a compact reasoning model built on its latest large language model (LLM), Qwen2.5-32b, one it says delivers performance comparable to other large cutting edge models, including Chinese rival DeepSeek and OpenAI’s o1, with only 32 billion parameters.
  • THE DECODER: Alibaba's latest AI model demonstrates how reinforcement learning can create efficient systems that match the capabilities of much larger models.
  • bdtechtalks.com: Alibaba’s QwQ-32B reasoning model matches DeepSeek-R1, outperforms OpenAI o1-mini
  • Last Week in AI: Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1
  • Last Week in AI: LWiAI Podcast #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors
  • Last Week in AI: #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors

Divyesh Vitthal@MarkTechPost //
Large language models (LLMs) are facing scrutiny regarding their reasoning capabilities and tendency to produce hallucinations, instances where they generate incorrect or fabricated information. Andrej Karpathy, former Senior Director of AI at Tesla, suggests that these hallucinations are emergent cognitive effects arising from the LLM training pipeline. He explains that LLMs predict words based on patterns in their training data, rather than possessing factual knowledge like humans. This leads to situations where models generate plausible-sounding but entirely false information.

Researchers are actively working on improving the reasoning skills of LLMs while minimizing computational costs. One approach involves using distilled reasoners, which allow for faster and more efficient inference. Additionally, Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that rivals the performance of the much larger DeepSeek-R1. This model achieves comparable results with significantly fewer parameters through reinforcement learning, showcasing the potential for enhancing model performance beyond conventional pretraining and post-training methods.

Recommended read:
References :
  • LearnAI: This article discusses large language models (LLMs) and their hallucinations, with an emphasis on Andrej Karpathy's explanation of this phenomenon.
  • Sebastian Raschka, PhD: This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.

@techstrong.ai //
Microsoft has been making significant strides in the realm of artificial intelligence, with developments ranging from new Copilot features to breakthroughs in quantum computing and large language models. The company recently released a native Copilot application for macOS, bringing the AI assistant to Mac users with features similar to the Windows version, including image uploading and text generation. The macOS version includes dark mode and a shortcut command for easy activation. Microsoft also removed limits to Copilot Voice and Think Deeper, allowing Copilot users to have extended conversations with the AI assistant.

Microsoft AI has also released LongRoPE2, a near-lossless method designed to extend the context window of Large Language Models (LLMs) to 128K tokens while retaining a high degree of short-context accuracy. This addresses a key limitation in LLMs, which often struggle to process long-context sequences effectively. In the quantum computing space, Microsoft researchers announced the creation of the first “topological qubits” in a device, representing a potential leap forward in the field.

Recommended read:
References :
  • MarkTechPost: Microsoft AI Releases LongRoPE2: A Near-Lossless Method to Extend Large Language Model Context Windows to 128K Tokens While Retaining Over 97% Short-Context Accuracy
  • THE DECODER: Microsoft readies in-house AI models to rival OpenAI and Anthropic, plans API access in 2025
  • TechCrunch: Microsoft reportedly ramps up AI efforts to compete with OpenAI
  • TestingCatalog: Microsoft working on voice avatars, generative layouts, and agents for Copilot
  • Source: Discover how Microsoft secures AI models on Azure AI Foundry, ensuring robust security and trustworthy deployments for your AI systems. The post appeared first on .

Esra Kayabali@AWS News Blog //
Anthropic has launched Claude 3.7 Sonnet, their most advanced AI model to date, designed for practical use in both business and development. The model is described as a hybrid system, offering both quick responses and extended, step-by-step reasoning for complex problem-solving. This versatility eliminates the need for separate models for different tasks. The company emphasized Claude 3.7 Sonnet’s strength in coding tasks. The model's reasoning capabilities allow it to analyze and modify complex codebases more effectively than previous versions and can process up to 128K tokens.

Anthropic also introduced Claude Code, an agentic coding tool, currently in limited research preview. The tool promises to revolutionize coding by automating parts of a developer's job. Claude 3.7 Sonnet is accessible across all Anthropic plans, including Free, Pro, Team, and Enterprise, and via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Extended thinking mode is reserved for paid subscribers. Pricing is set at $3 per million input tokens and $15 per million output tokens. Anthropic stated they reduced unnecessary refusals by 45% compared to its predecessor.

Recommended read:
References :
  • AI & Machine Learning: Anthropic's Claude 3.7 Sonnet available on Vertex AI
  • Fello AI: Claude 3.7 Sonnet is a new release from Anthropic
  • PCMag Middle East ai: PCMag highlights the key features and trends embodied by Claude 3.7 Sonnet.
  • venturebeat.com: Claude 3.7 Sonnet aims to compete with other major AI models
  • Analytics Vidhya: Anthropic's new model can manage two types of information processing at once
  • Analytics Vidhya: Claude 3.7 Sonnet vs Grok 3: Which LLM is Better at Coding?
  • Digital Information World: Digital Information World reports on the launch of Claude 3.7 Sonnet and its competitive landscape.
  • Shelly Palmer: Claude 3.7 Sonnet: Coding Meets Reasoning
  • OODAloop: A new generation of AIs: Claude 3.7 and Grok 3
  • AWS News Blog: Anthropic’s Claude 3.7 Sonnet hybrid reasoning model is now available in Amazon Bedrock
  • Analytics Vidhya: Claude 3.7 Sonnet: The Best Coding Model Yet?
  • blog.jetbrains.com: Anthropic's Claude 3.7 Sonnet is a new AI reasoning model, described as a hybrid system blending fast responses with detailed reasoning, adjustable for various tasks. It is particularly strong in coding and demonstrates remarkable accuracy on real-world software tasks. It is designed to handle both quick answers and more challenging tasks.
  • Analytics Vidhya: Artificial intelligence is immensely revolutionizing technology, providing performance enhancements, tweaks, and improvements with each generation of models. One of its latest developments is the Anthropics Claude 3.7 Sonnet- a sophisticated AI model that primes itself for changing creative, analytical, and coding tasks. It offers new improved Claude code with great tools designed for automating and
  • Towards AI: TAI #141: Claude 3.7 Sonnet; Software Dev Focus in Anthropic’s First Thinking Model headline feature is its “extended thinkingâ€� mode, where the model now explicitly shows multi-step reasoning before finalizing answers.

@www.analyticsvidhya.com //
References: Composio , Analytics Vidhya
OpenAI recently launched o3-mini, the first model in its o3 family, which includes two specialized variants: o3-mini-high and o3-mini-low. The o3-mini-high variant is designed to spend more time reasoning, providing more in-depth answers, while the o3-mini-low prioritizes speed for quicker responses. Benchmarks indicate that o3-mini displays comparable performance to OpenAI's o1 model, but at a significantly lower cost.

Researchers have noted that o3-mini is roughly 15 times cheaper and about five times faster than o1. Despite being cheaper than GPT-4o, o3-mini has a stricter usage limit of 150 messages per hour, raising questions about OpenAI's subsidization strategy. The model has demonstrated superior performance to o1 on benchmarks such as FrontierMath, Codeforces, and GPQA. Additionally, o3-mini is the first reasoning model from OpenAI to feature official function-calling support, making it particularly useful for AI agents.

Recommended read:
References :
  • Composio: OpenAI launched its latest model, the o3-mini, last Friday. It is the first member of the o3 family of models. There are
  • Analytics Vidhya: OpenAI's o1 and o3-mini are advanced reasoning models that differ from the base GPT-4 (often referred to as GPT-4o) in how they process prompts and produce answers. These models are designed to spend more time “thinking” through complex problems, mimicking a human’s analytical approach. To leverage these models effectively, it’s crucial to understand how to ...

@the-decoder.com //
Perplexity AI has launched Deep Research, an AI-powered research tool aimed at competing with OpenAI and Google Gemini. Using DeepSeek-R1, Perplexity is offering comprehensive research reports at a much lower cost than OpenAI, with 500 queries per day for $20 per month compared to OpenAI's $200 per month for only 100 queries. The new service automatically conducts dozens of searches and analyzes hundreds of sources to produce detailed reports in one to two minutes.

Perplexity claims Deep Research performs 8 searches and consults 42 sources to generate a 1,300-word report in under 3 minutes. The company says that Deep Research tool works particularly well for finance, marketing, and technology research. The service is launching first on web browsers, with iOS, Android, and Mac versions planned for later release. Perplexity CEO Aravind Srinivas stated he wants to keep making it faster and cheaper for the interest of humanity.

Recommended read:
References :
  • the-decoder.com: Perplexity uses Deepseek-R1 to offer Deep Research 10 times cheaper than OpenAI
  • www.analyticsvidhya.com: Enhancing Multimodal RAG with Deepseek Janus Pro
  • www.marktechpost.com: DeepSeek AI Introduces CODEI/O: A Novel Approach that Transforms Code-based Reasoning Patterns into Natural Language Formats to Enhance LLMs’ Reasoning Capabilities
  • venturebeat.com: Perplexity just made AI research crazy cheap—what that means for the industry
  • Analytics Vidhya: The landscape of AI-powered research just became even more competitive with the launch of Perplexity’s Deep Research. Previously, OpenAI and Google Gemini were leading the way in this space, and now Perplexity has joined the ranks.
  • iHLS: New York State Bans DeepSeek AI App Over Security Concerns
  • NextBigFuture.com: Does DeepSeek Impact the Future of AI Data Centers?
  • THE DECODER: Perplexity's Deep Research utilizes DeepSeek-R1 for generating comprehensive research reports.
  • www.ghacks.net: Perplexity AI has unveiled its latest feature, the 'Deep Research' tool, designed to enhance users' ability to conduct comprehensive research on complex topics.
  • PCMag Middle East ai: Perplexity Launches a Free 'Deep Research' AI Tool
  • bsky.app: Perplexity follows OpenAI with the release of its Deep Research.
  • techstrong.ai: Perplexity AI Launches a Deep Research Tool to Help Humans Research, Deeply
  • Data Phoenix: Perplexity has launched Deep Research, a free AI-powered research tool that can analyze hundreds of sources in minutes to create comprehensive reports across various domains, promising to save users significant research time.
  • eWEEK: Perplexity 1776 Model Fixes DeepSeek-R1’s “Refusal to Respond to Sensitive Topicsâ€�

Ben Lorica@Gradient Flow //
Recent advancements in AI reasoning models are demonstrating step-by-step reasoning, self-correction, and multi-step decision-making, opening up application areas that require logical inference and strategic planning. These new models are rapidly evolving, driven by decreasing training costs which enables faster iteration and higher performance. This is also facilitated by the advent of techniques such as model compression, quantization, and distillation, making it possible to run sophisticated models on less powerful hardware.

The competitive landscape is becoming more global, and teams from countries like China are rapidly closing the performance gap, offering diverse approaches and fostering healthy competition. Open AI's deep research feature enables models to generate detailed reports after prolonged inference periods, which directly competes with Google's Gemini 2.0. It appears that one can spend large amounts of money and get continuous and predictable gains with AI models.

Recommended read:
References :
  • Sam Altman: This article provides insights into the reasoning capabilities of the AI models and their impact on the overall technology landscape.
  • THE DECODER: This article discusses OpenAI's new reasoning models, emphasizing direct instruction over complex prompts.
  • the-decoder.com: OpenAI has published guidelines for effective use of its o-series models, emphasizing direct instruction over complex prompting techniques.
  • www.analyticsvidhya.com: OpenAI’s o1 and o3-mini are advanced reasoning models that differ from the base GPT-4 (often referred to as GPT-4o) in how they process prompts and produce answers.

@www.trendforce.com //
OpenAI is enhancing its AI models with improved reasoning and logical capabilities, focusing on chain-of-thought techniques and multimodal approaches. The company's o3-mini model now offers users a more detailed view of its reasoning process, a move towards greater transparency. This change aims to bridge the gap with models like DeepSeek-R1, which already reveals its complete chain-of-thought.

OpenAI also plans to launch the o3 Deep Research agent to both free and ChatGPT Plus users. This advancement will enable the model to generate comprehensive reports following extended inference periods, directly competing with Google's Gemini 2.0 reasoning models. These improvements highlight OpenAI's ongoing efforts to refine AI reasoning and provide users with more insights into the decision-making processes of its models.

Recommended read:
References :
  • Last Week in AI: #199 - OpenAI's 03-mini, Gemini Thinking, Deep Research, s1
  • bdtechtalks.com: OpenAI reveals o3’s reasoning process to bridge gap with DeepSeek-R1
  • www.analyticsvidhya.com: OpenAI’s o1 and o3-mini are advanced reasoning models that differ from the base GPT-4 (often referred to as GPT-4o) in how they process prompts and produce answers.

Emily Forlini@PCMag Middle East ai //
DeepSeek is emerging as a notable contender in the AI landscape, challenging established players with its DeepSeek-R1 model. Recent analysis highlights DeepSeek-R1's reasoning capabilities, positioning it as a potential alternative to models like OpenAI's GPT. The company's focus on AI infrastructure and model development, combined with its competitive pricing strategy, is attracting attention and driving its expansion.

The affordability of DeepSeek's models, reportedly up to 9 times cheaper than competitors, is a significant factor in its growing popularity. However, some reports suggest that this lower cost may come with trade-offs in terms of latency and potential server resource constraints, impacting the speed of responses. While DeepSeek is expanding, the Center for Security and Emerging Technology has weighed in on the US and China's race to AI dominance.

The DeepSeek-R1 model is built with a Mixture-of-Experts framework that only uses a subset of its parameters per input for high efficiency and scalability.

Recommended read:
References :