News from the AI & ML world

DeeperML - #reasoning

Apple Research Questioning LLMs Ability in Reasoning - Apple research challenges the reasoning capabilities of Large Reasoning Models (LRMs), suggesting they struggle with basic reasoning tasks, sparking debate within the AI community where Google is on the other side of the debate.

References: TheSequence , chatgptiseatingtheworld.com , arstechnica.com ...

Apple researchers are challenging the perceived reasoning capabilities of Large Reasoning Models (LRMs), sparking debate within the AI community. A recent paper from Apple, titled "The Illusion of Thinking," suggests that these models, which generate intermediate thinking steps like Chain-of-Thought reasoning, struggle with fundamental reasoning tasks. The research indicates that current evaluation methods relying on math and code benchmarks are insufficient, as they often suffer from data contamination and fail to assess the structure or quality of the reasoning process.

To address these shortcomings, Apple researchers introduced controllable puzzle environments, including the Tower of Hanoi, River Crossing, Checker Jumping, and Blocks World, allowing for precise manipulation of problem complexity. These puzzles require diverse reasoning abilities, such as constraint satisfaction and sequential planning, and are free from data contamination. The Apple paper concluded that state-of-the-art LRMs ultimately fail to develop generalizable problem-solving capabilities, with accuracy collapsing to zero beyond certain complexities across different environments.

However, the Apple research has faced criticism. Experts, like Professor Seok Joon Kwon, argue that Apple's lack of high-performance hardware, such as a large GPU-based cluster comparable to those operated by Google or Microsoft, could be a factor in their findings. Some argue that the models perform better on familiar puzzles, suggesting that their success may be linked to training exposure rather than genuine problem-solving skills. Others, such as Alex Lawsen and "C. Opus," argue that the Apple researchers' results don't support claims about fundamental reasoning limitations, but rather highlight engineering challenges related to token limits and evaluation methods.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

TheSequence: The Sequence Research #663: The Illusion of Thinking, Inside the Most Controversial AI Paper of Recent Weeks
chatgptiseatingtheworld.com: Research: Did Apple researchers overstate â€œThe Illusion of Thinkingâ€ in reasoning models. Opus, Lawsen think so.
www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
arstechnica.com: New Apple study challenges whether AI models truly â€œreasonâ€ through problems
9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

nftjedi@chatgptiseatingtheworld.com //

Apple Research Paper on AI Reasoning Faces Criticism - Apple researchers published a paper questioning the reasoning abilities of LLMs, arguing they rely on pattern matching rather than true reasoning, but critics argue that the experiments were unfairly designed.

References: chatgptiseatingtheworld.com , Digital Information World , Bernard Marr ...

Apple researchers recently published a study titled "The Illusion of Thinking," suggesting that advanced language models (LLMs) struggle with true reasoning, relying instead on pattern matching. The study presented findings based on tasks like the Tower of Hanoi puzzle, where models purportedly failed when complexity increased, leading to the conclusion that these models possess limited problem-solving abilities. However, these conclusions are now under scrutiny, with critics arguing the experiments were not fairly designed.

Alex Lawsen of Open Philanthropy has published a counter-study challenging the foundations of Apple's claims. Lawsen argues that models like Claude, Gemini, and OpenAI's latest systems weren't failing due to cognitive limits, but rather because the evaluation methods didn't account for key technical constraints. One issue raised was that models were often cut off from providing full answers because they neared their maximum token limit, a built-in cap on output text, which Apple's evaluation counted as a reasoning failure rather than a practical limitation.

Another point of contention involved the River Crossing test, where models faced unsolvable problem setups. When the models correctly identified the tasks as impossible and refused to attempt them, they were still marked wrong. Furthermore, the evaluation system strictly judged outputs against exhaustive solutions, failing to credit models for partial but correct answers, pattern recognition, or strategic shortcuts. To illustrate, Lawsen demonstrated that when models were instructed to write a program to solve the Hanoi puzzle, they delivered accurate, scalable solutions even with 15 disks, contradicting Apple's assertion of limitations.

Recommended read:

Top link: chatgptiseatingtheworld.com
Permalink: More details

References :

chatgptiseatingtheworld.com: Research: Did Apple researchers overstate â€œThe Illusion of Thinkingâ€ in reasoning models. Opus, Lawsen think so.
Digital Information World: Appleâ€™s AI Critique Faces Pushback Over Flawed Testing Methods
NextBigFuture.com: Apple Researcher Claims Illusion of AI Thinking Versus OpenAI Solving Ten Disk Puzzle
Bernard Marr: Beyond The Hype: What Apple's AI Warning Means For Business Leaders

Carl Franzen@AI News | VentureBeat //

Mistral AI Launches Magistral Reasoning Models Openly - Mistral AI launched its first reasoning model, Magistral, available in both large and small Apache 2.0 versions, and introduced Mistral Agents API’s Handoffs feature for smart, multi-agent workflows.

References: Simon Willison , Simon Willison's Weblog , AI News | VentureBeat ...

Mistral AI has launched its first reasoning model, Magistral, signaling a commitment to open-source AI development. The Magistral family features two models: Magistral Small, a 24-billion parameter model available with open weights under the Apache 2.0 license, and Magistral Medium, a proprietary model accessible through an API. This dual release strategy aims to cater to both enterprise clients seeking advanced reasoning capabilities and the broader AI community interested in open-source innovation.

Mistral's decision to release Magistral Small under the permissive Apache 2.0 license marks a significant return to its open-source roots. The license allows for the free use, modification, and distribution of the model's source code, even for commercial purposes. This empowers startups and established companies to build and deploy their own applications on top of Mistral’s latest reasoning architecture, without the burdens of licensing fees or vendor lock-in. The release serves as a powerful counter-narrative, reaffirming Mistral’s dedication to arming the open community with cutting-edge tools.

Magistral Medium demonstrates competitive performance in the reasoning arena, according to internal benchmarks released by Mistral. The model was tested against its predecessor, Mistral-Medium 3, and models from Deepseek. Furthermore, Mistral's Agents API's Handoffs feature facilitates smart, multi-agent workflows, allowing different agents to collaborate on complex tasks. This enables modular and efficient problem-solving, as demonstrated in systems where agents collaborate to answer inflation-related questions.

Recommended read:

Top link: AI News | VentureBeat
Permalink: More details

References :

Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium.
Simon Willison's Weblog: Mistral's first reasoning model is out today, in two sizes. There's a 24B Apache 2 licensed open-weights model called Magistral Small (actually Magistral-Small-2506), and a larger API-only model called Magistral Medium.
THE DECODER: Mistral launches Europe's first reasoning model Magistral but lags behind competitors
AI News | VentureBeat: The company is signaling that the future of reasoning AI will be both powerful and, in a meaningful way, open to all.
www.marktechpost.com: How to Create Smart Multi-Agent Workflows Using the Mistral Agents API’s Handoffs Feature
TestingCatalog: Mistral AI debuts Magistral models focused on advanced reasoning
www.artificialintelligence-news.com: Mistral AI has pulled back the curtain on Magistral, their first model specifically built for reasoning tasks.
www.infoworld.com: Mistral AI unveils Magistral reasoning model
AI News: Mistral AI has pulled back the curtain on Magistral, their first model specifically built for reasoning tasks.
the-decoder.com: The French start-up Mistral is launching its first reasoning model on the market with Magistral. It is designed to enable logical thinking in European languages.
Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium. My notes here, including running Small locally with Ollama and accessing Medium via my llm-mistral plugin
SiliconANGLE: Mistral AI debuts new Magistral series of reasoning LLMs.
siliconangle.com: Mistral AI debuts new Magistral series of reasoning LLMs
MarkTechPost: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
www.marktechpost.com: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
WhatIs: What differentiates Mistral AI reasoning model Magistral
AlternativeTo: Mistral AI debuts Magistral: a transparent, multilingual reasoning model family, including open-source Magistral Small available on Hugging Face and enterprise-focused Magistral Medium available on various platforms.

Mark Tyson@tomshardware.com //

OpenAI Launches o3-pro Reasoning Model - OpenAI released o3-pro, a smarter model with enhanced capabilities in math, science, and programming, available to ChatGPT Pro and Team subscribers and through OpenAI's API, with a price cut of 87%.

References: Maginative , AI News | VentureBeat , THE DECODER ...

OpenAI has recently launched its newest reasoning model, o3-pro, making it available to ChatGPT Pro and Team subscribers, as well as through OpenAI’s API. Enterprise and Edu subscribers will gain access the following week. The company touts o3-pro as a significant upgrade, emphasizing its enhanced capabilities in mathematics, science, and coding, and its improved ability to utilize external tools.

OpenAI has also slashed the price of o3 by 80% and o3-pro by 87%, positioning the model as a more accessible option for developers seeking advanced reasoning capabilities. This price adjustment comes at a time when AI providers are competing more aggressively on both performance and affordability. Experts note that evaluations consistently prefer o3-pro over the standard o3 model across all categories, especially in science, programming, and business tasks.

O3-pro utilizes the same underlying architecture as o3, but it’s tuned to be more reliable, especially on complex tasks, with better long-range reasoning. The model supports tools like web browsing, code execution, vision analysis, and memory. While the increased complexity can lead to slower response times, OpenAI suggests that the tradeoff is worthwhile for the most challenging questions "where reliability matters more than speed, and waiting a few minutes is worth the tradeoff.”

Recommended read:

Top link: tomshardware.com
Permalink: More details

References :

Maginative: OpenAI’s new o3-pro model is now available in ChatGPT and the API, offering top-tier performance in math, science, and coding—at a dramatically lower price.
AI News | VentureBeat: OpenAI's most powerful reasoning model, o3, is now 80% cheaper, making it more affordable for businesses, researchers, and individual developers.
Latent.Space: OpenAI just dropped the price of their o3 model by 80% today and launched o3-pro.
THE DECODER: OpenAI has lowered the price of its o3 language model by 80 percent, CEO Sam Altman said.
Simon Willison's Weblog: OpenAI's Adam Groth explained that the engineers have optimized inference, allowing a significant price reduction for the o3 model.
the-decoder.com: OpenAI lowered the price of its o3 language model by 80 percent, CEO Sam Altman said.
AI News | VentureBeat: OpenAI released the latest in its o-series of reasoning model that promises more reliable and accurate responses for enterprises.
bsky.app: The OpenAI API is back to running at 100% again, plus we dropped o3 prices by 80% and launched o3-pro - enjoy!
Sam Altman: We are past the event horizon; the takeoff has started. Humanity is close to building digital superintelligence, and at least so far it’s much less weird than it seems like it should be.
siliconangle.com: OpenAIâ€™s newest reasoning model o3-pro surpasses rivals on multiple benchmarks, but itâ€™s not very fast
SiliconANGLE: OpenAI’s newest reasoning model o3-pro surpasses rivals on multiple benchmarks, but it’s not very fast
bsky.app: the OpenAI API is back to running at 100% again, plus we dropped o3 prices by 80% and launched o3-pro - enjoy!
bsky.app: OpenAI has launched o3-pro. The new model is available to ChatGPT Pro and Team subscribers and in OpenAI’s API now, while Enterprise and Edu subscribers will get access next week. If you use reasoning models like o1 or o3, try o3-pro, which is much smarter and better at using external tools.
The Algorithmic Bridge: OpenAI o3-Pro Is So Good That I Can’t Tell How Good It Is
datafloq.com: What is OpenAI o3 and How is it Different than other LLMs?
www.marketingaiinstitute.com: [The AI Show Episode 153]: OpenAI Releases o3-Pro, Disney Sues Midjourney, Altman: â€œGentle Singularityâ€ Is Here, AI and Jobs & News Sites Getting Crushed by AI Search

Carl Franzen@AI News | VentureBeat //

Mistral AI Launches First Reasoning Model Magistral - Mistral AI released Magistral, its first reasoning LLM, in two versions: Magistral Small (24B parameters, open weights) and Magistral Medium (API-only), promoting it for traceable reasoning and creative writing.

References: AI News | VentureBeat , Simon Willison , the-decoder.com ...

Mistral AI has launched Magistral, its inaugural reasoning large language model (LLM), available in two distinct versions. Magistral Small, a 24 billion parameter model, is offered with open weights under the Apache 2.0 license, enabling developers to freely use, modify, and distribute the code for commercial or non-commercial purposes. This model can be run locally using tools like Ollama. The other version, Magistral Medium, is accessible exclusively via Mistral’s API and is tailored for enterprise clients, providing traceable reasoning capabilities crucial for compliance in highly regulated sectors such as legal, financial, healthcare, and government.

Mistral is positioning Magistral as a powerful tool for both professional and creative applications. The company highlights Magistral's ability to perform "transparent, multilingual reasoning," making it suitable for tasks involving complex calculations, programming logic, decision trees, and rule-based systems. Additionally, Mistral is promoting Magistral for creative writing, touting its capacity to generate coherent or, if desired, uniquely eccentric content. Users can experiment with Magistral Medium through the "Thinking" mode within Mistral's Le Chat platform, with options for "Pure Thinking" and a high-speed "10x speed" mode powered by Cerebras.

Benchmark tests reveal that Magistral Medium is competitive in the reasoning arena. On the AIME-24 mathematics benchmark, the model achieved an impressive 73.6% accuracy, comparable to its predecessor, Mistral Medium 3, and outperforming Deepseek's models. Mistral's strategic release of Magistral Small under the Apache 2.0 license is seen as a reaffirmation of its commitment to open source principles. This move contrasts with the company's previous release of Medium 3 as a proprietary offering, which had raised concerns about a shift towards a more closed ecosystem.

Recommended read:

Top link: AI News | VentureBeat
Permalink: More details

References :

AI News | VentureBeat: Mistrals first reasoning model, Magistral, launches with large and small Apache 2.0 version.
Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium. My notes here, including running Small locally with Ollama and accessing Medium via my llm-mistral plugin
Simon Willison's Weblog: Magistral â€” the first reasoning model by Mistral AI
the-decoder.com: Mistral launches Europe's first reasoning model Magistral but lags behind competitors
SiliconANGLE: Mistral AI debuts new Magistral series of reasoning LLMs
MarkTechPost: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
TestingCatalog: Mistral AI debuts Magistral models focused on advanced reasoning
siliconangle.com: Mistral AI SAS today introduced Magistral, a new lineup of reasoning-optimized large language models. The LLM series includes two algorithms on launch.
www.artificialintelligence-news.com: Mistral AI challenges big tech with reasoning model
www.marktechpost.com: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
WhatIs: What differentiates Mistral AI reasoning model Magistral

@medium.com //

DeepSeek R1-0528 Excels in Math and Reasoning - DeepSeek R1-0528 is a new reasoning model outperforming rivals in math and reasoning, attributed to architecture and training data, posing a challenge to established closed models.

References: , TheSequence , thezvi.substack.com ...

DeepSeek's latest AI model, R1-0528, is making waves in the AI community due to its impressive performance in math and reasoning tasks. This new model, despite having a similar name to its predecessor, boasts a completely different architecture and performance profile, marking a significant leap forward. DeepSeek R1-0528 has demonstrated "unprecedented levels of demand" shooting to the top of the App Store past closed model rivals and overloading their API with unprecedented levels of demand to the point that they actually had to stop accepting payments.

The most notable improvement in DeepSeek R1-0528 is its mathematical reasoning capabilities. On the AIME 2025 test, the model's accuracy increased from 70% to 87.5%, surpassing Gemini 2.5 Pro and putting it in close competition with OpenAI's o3. This improvement is attributed to "enhanced thinking depth," with the model using significantly more tokens per question, engaging in more thorough chains of reasoning. This means the model can check its own work, recognize errors, and course-correct during problem-solving.

DeepSeek's success is challenging established closed models and driving competition in the AI landscape. DeepSeek-R1-0528 continues to utilize a Mixture-of-Experts (MoE) architecture, now scaled up to an enormous size. This sparse activation allows for powerful specialized expertise in different coding domains while maintaining efficiency. The context also continues to remain at 128k (with RoPE scaling or other improvements capable of extending it further.) The rise of DeepSeek is underscored by its performance benchmarks, which show it outperforming some of the industry’s leading models, including OpenAI’s ChatGPT. Furthermore, the release of a distilled variant, R1-0528-Qwen3-8B, ensures broad accessibility of this powerful technology.

Recommended read:

Top link: medium.com
Permalink: More details

References :

: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
TheSequence: The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
lambda.ai: DeepSeek-R1-0528: The Open-Source Titan Now Live on Lambdaâ€™s Inference API
thezvi.substack.com: DeepSeek-r1-0528 Did Not Have a Moment
thezvi.wordpress.com: DeepSeek-r1-0528 Did Not Have a Moment

@www.eweek.com //

Meta's AI Accuracy Study and Military Tech Collaboration - Meta partners with Anduril to build AR/VR devices for the military to enhance soldiers' situational awareness and AI accuracy improves by 34% with shorter reasoning chains.

References: The Register - Security , Quartz , eWEEK ...

Meta is making a significant move into military technology, partnering with Anduril Industries to develop augmented and virtual reality (XR) devices for the U.S. Army. This collaboration reunites Meta with Palmer Luckey, the founder of Oculus who was previously fired from the company. The initiative aims to provide soldiers with enhanced situational awareness on the battlefield through advanced perception capabilities and AI-enabled combat tools. The devices, potentially named EagleEye, will integrate Meta's Llama AI models with Anduril's Lattice system to deliver real-time data and improve operational coordination.

The new XR headsets are designed to support real-time threat detection, such as identifying approaching drones or concealed enemy positions. They will also provide interfaces for operating AI-powered weapon systems. Anduril states that the project will save the U.S. military billions of dollars by using high-performance components and technology originally developed for commercial use. The partnership reflects a broader trend of Meta aligning more closely with national security interests.

In related news, Meta's research team has made a surprising discovery that shorter reasoning chains can significantly improve AI accuracy. A study released by Meta and The Hebrew University of Jerusalem found that AI models achieve 34.5% better accuracy when using shorter reasoning processes. This challenges the conventional belief that longer, more complex reasoning chains lead to better results. The researchers developed a new method called "short-m@k," which runs multiple reasoning attempts in parallel, halting computation once the first few processes are complete and selecting the final answer through majority voting. This method could reduce computing costs by up to 40% while maintaining performance levels.

Recommended read:

Top link: www.eweek.com
Permalink: More details

References :

The Register - Security: Giving people the power to build community and bring the world closer together so we can shoot them Meta has partnered with Anduril Industries to build augmented and virtual reality devices for the military, eight years after it fired the defense firm's founder, Palmer Luckey.â€
Quartz: American soldiers on the battlefield will soon be receiving a boost from Facebook.
venturebeat.com: New research from Meta reveals AI models achieve 34.5% better accuracy with shorter reasoning chains, challenging industry assumptions and potentially reducing computing costs by 40%.
eWEEK: Meta is developing extended reality headsets tailored for military use, designed to enhance soldiersâ€™ situational awareness on the battlefield.

@www.marktechpost.com //

DeepSeek Updates R1 Model, Boosts Reasoning Capability - DeepSeek has released R1-0528, an updated open-source reasoning AI model, enhancing its capabilities and positioning itself as a strong open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

References: pub.towardsai.net , AI News | VentureBeat , Kyle Wiggers ? ...

DeepSeek, a Chinese AI startup, has launched an updated version of its R1 reasoning AI model, named DeepSeek-R1-0528. This new iteration brings the open-source model near parity with proprietary paid models like OpenAI’s o3 and Google’s Gemini 2.5 Pro in terms of reasoning capabilities. The model is released under the permissive MIT License, enabling commercial use and customization, marking a commitment to open-source AI development. The model's weights and documentation are available on Hugging Face, facilitating local deployment and API integration.

The DeepSeek-R1-0528 update introduces substantial enhancements in the model's ability to handle complex reasoning tasks across various domains, including mathematics, science, business, and programming. DeepSeek attributes these improvements to leveraging increased computational resources and applying algorithmic optimizations in post-training. Notably, the accuracy on the AIME 2025 test has surged from 70% to 87.5%, demonstrating deeper reasoning processes with an average of 23,000 tokens per question, compared to the previous version's 12,000 tokens.

Alongside enhanced reasoning, the updated R1 model boasts a reduced hallucination rate, which contributes to more reliable and consistent output. Code generation performance has also seen a boost, positioning it as a strong contender in the open-source AI landscape. DeepSeek provides instructions on its GitHub repository for those interested in running the model locally and encourages community feedback and questions. The company aims to provide accessible AI solutions, underscored by the availability of a distilled version of R1-0528, DeepSeek-R1-0528-Qwen3-8B, designed for efficient single-GPU operation.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

pub.towardsai.net: DeepSeek R1Â : Is It Right For You? (A Practical Selfâ€‘Assessment for Businesses and Individuals)
AI News | VentureBeat: DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro
Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
MacStories: Details about DeepSeek's R1-0528 model and its improved performance.
MarkTechPost: Information about DeepSeek's R1-0528 model and its enhancements in math and code performance.
www.marktechpost.com: DeepSeek, the Chinese AI Unicorn, has released an updated version of its R1 reasoning model, named DeepSeek-R1-0528. This release enhances the modelâ€™s capabilities in mathematics, programming, and general logical reasoning, positioning it as a formidable open-source alternative to leading models like OpenAIâ€™s o3 and Googleâ€™s Gemini 2.5 Pro. Technical Enhancements The R1-0528 update introduces significant [â€¦]
www.analyticsvidhya.com: When DeepSeek R1 launched in January, it instantly became one of the most talked-about open-source models on the scene, gaining popularity for its sharp reasoning and impressive performance. Fast-forward to today, and DeepSeek is back with a so-called â€œminor trial upgradeâ€, but donâ€™t let the modest name fool you. DeepSeek-R1-0528 delivers major leaps in reasoning, [â€¦]
: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
TheSequence: The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
Fello AI: In late May 2025, Chinese startup DeepSeek quietly rolled out R1-0528, a beefed-up version of its open-source R1 reasoning model.
felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAIâ€™s o3 & Gemini 2.5 Pro

@www.marktechpost.com //

DeepSeek R1-0528 Update Improves Reasoning and Coding - DeepSeek released DeepSeek-R1-0528, with improved performance in math, coding, and general reasoning and is a competitive open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

References: Kyle Wiggers ? , AI News | VentureBeat , MacStories ...

DeepSeek has released a major update to its R1 reasoning model, dubbed DeepSeek-R1-0528, marking a significant step forward in open-source AI. The update boasts enhanced performance in complex reasoning, mathematics, and coding, positioning it as a strong competitor to leading commercial models like OpenAI's o3 and Google's Gemini 2.5 Pro. The model's weights, training recipes, and comprehensive documentation are openly available under the MIT license, fostering transparency and community-driven innovation. This release allows researchers, developers, and businesses to access cutting-edge AI capabilities without the constraints of closed ecosystems or expensive subscriptions.

The DeepSeek-R1-0528 update brings several core improvements. The model's parameter count has increased from 671 billion to 685 billion, enabling it to process and store more intricate patterns. Enhanced chain-of-thought layers deepen the model's reasoning capabilities, making it more reliable in handling multi-step logic problems. Post-training optimizations have also been applied to reduce hallucinations and improve output stability. In practical terms, the update introduces JSON outputs, native function calling, and simplified system prompts, all designed to streamline real-world deployment and enhance the developer experience.

Specifically, DeepSeek R1-0528 demonstrates a remarkable leap in mathematical reasoning. On the AIME 2025 test, its accuracy improved from 70% to an impressive 87.5%, rivaling OpenAI's o3. This improvement is attributed to "enhanced thinking depth," with the model now utilizing significantly more tokens per question, indicating more thorough and systematic logical analysis. The open-source nature of DeepSeek-R1-0528 empowers users to fine-tune and adapt the model to their specific needs, fostering further innovation and advancements within the AI community.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
AI News | VentureBeat: VentureBeat article on DeepSeek R1-0528.
Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
MacStories: Testing DeepSeek R1-0528 on the M3 Ultra Mac Studio and Installing Local GGUF Models with Ollama on macOS
www.analyticsvidhya.com: New Deepseek R1-0528 Update is INSANE
www.marktechpost.com: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
NextBigFuture.com: DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training.
MarkTechPost: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
pandaily.com: In the early hours of May 29, Chinese AI startup DeepSeek quietly open-sourced the latest iteration of its R1 large language model, DeepSeek-R1-0528, on the Hugging Face platform .
www.computerworld.com: Reports that DeepSeek releases a new version of its R1 reasoning AI model.
techcrunch.com: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
the-decoder.com: Deepseek's R1 model closes the gap with OpenAI and Google after major update
Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
Analytics India Magazine: The new DeepSeek-R1 Is as good as OpenAI o3 and Gemini 2.5 Pro
: The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
simonwillison.net: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
TheSequence: This article provides an overview of the new DeepSeek R1-0528 model and notes its improvements over the prior model released in January.
Kyle Wiggers ?: News about the release of DeepSeek's updated R1 AI model, emphasizing its increased censorship.
Fello AI: Reports that the R1-0528 model from DeepSeek is matching the capabilities of OpenAI's o3 and Google's Gemini 2.5 Pro.
felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAIâ€™s o3 & Gemini 2.5 Pro
www.tomsguide.com: DeepSeekâ€™s latest update is a serious threat to ChatGPT and Google â€” hereâ€™s why

Eric Hal@techradar.com //

Google Gemini Enhanced with New AI Capabilities and Integrations - Google launched AI Mode integrated with Gemini 2.5, offering detailed results, enhanced data visualization, virtual shopping assistance, and the NotebookLM mobile app with AI-generated audio summaries, competing with AI chatbots like ChatGPT.

References: Search Engine Journal , www.techradar.com , www.tomsguide.com ...

Google I/O 2025 saw the unveiling of 'AI Mode' for Google Search, signaling a significant shift in how the company approaches information retrieval and user experience. The new AI Mode, powered by the Gemini 2.5 model, is designed to offer more detailed results, personal context, and intelligent assistance. This upgrade aims to compete directly with the capabilities of AI chatbots like ChatGPT, providing users with a more conversational and comprehensive search experience. The rollout has commenced in the U.S. for both the browser version of Search and the Google app, although availability in other countries remains unconfirmed.

AI Mode brings several key features to the forefront, including Deep Search, Live Visual Search, and AI-powered agents. Deep Search allows users to delve into topics with unprecedented depth, running hundreds of searches simultaneously to generate expert-level, fully-cited reports in minutes. With Search Live, users can leverage their phone's camera to interact with Search in real-time, receiving context-aware responses from Gemini. Google is also bringing agentic capabilities to Search, allowing users to perform tasks like booking tickets and making reservations directly through the AI interface.

Google’s revamp of its AI search service appears to be a response to the growing popularity of AI-driven search experiences offered by companies like OpenAI and Perplexity. According to Gartner analyst Chirag Dekate, evidence suggests a greater reliance on search and AI-infused search experiences. As AI Mode rolls out, Google is encouraging website owners to optimize their content for AI-powered search by creating unique, non-commodity content and ensuring that their sites meet technical requirements and provide a good user experience.

Recommended read:

Top link: techradar.com
Permalink: More details

References :

Search Engine Journal: Google's new AI Mode in Search, integrating Gemini 2.5, aims to enhance user interaction by providing more conversational and comprehensive responses.
www.techradar.com: Google just got a new 'Deep Think' mode â€“ and 6 other upgrades
WhatIs: Google expands Gemini model, Search as AI rivals encroach
www.tomsguide.com: Google Search gets an AI tab â€” hereâ€™s what it means for your searches
AI News | VentureBeat: Inside Googleâ€™s AI leap: Gemini 2.5 thinks deeper, speaks smarter and codes faster
Search Engine Journal: Google Gemini upgrades include Chrome integration, Live visual tools, and enhanced 2.5 models. Learn how these AI advances could reshape your marketing strategy.
Google DeepMind Blog: Gemini 2.5: Our most intelligent models are getting even better
learn.aisingapore.org: Updates to Gemini 2.5 from Google DeepMind
THE DECODER: Google upgrades Gemini 2.5 Pro with a new Deep Think mode for advanced reasoning abilities
www.techradar.com: I've been using Google's new AI mode for Search â€“ here's how to master it
www.theguardian.com: Search engine revamp and Gemini 2.5 introduced at conference in latest showing tech giant is all in on AI on Tuesday unleashed another wave of technology to accelerate a year-long makeover of its search engine that is changing the way people get information and curtailing the flow of internet traffic to other websites.
AI Talent Development: Updates to Gemini 2.5 from Google DeepMind
www.analyticsvidhya.com: Google I/O 2025: AI Mode on Google Search, Veo 3, Imagen 4, Flow, Gemini Live, and More
techvro.com: Google AI Mode Promises Deep Search and Goes Beyond AI Overviews
THE DECODER: Google pushes AI-powered search with agents, multimodality, and virtual shopping
felloai.com: GoogleÂ I/O 2025Â Recap With All TheÂ Jaw-DroppingÂ AIÂ Announcements
Analytics Vidhya: Google I/O 2025: AI Mode on Google Search, Veo 3, Imagen 4, Flow, Gemini Live, and More
AI Talent Development: Gemini as a universal AI assistant
Fello AI: GoogleÂ I/O 2025Â Recap With All TheÂ Jaw-DroppingÂ AIÂ Announcements
AI & Machine Learning: Today at Google I/O, we're expanding that help enterprises build more sophisticated and secure AI-driven applications and agents
www.techradar.com: Google Gemini 2.5 Flash promises to be your favorite AI chatbot, but how does it compare to ChatGPT 4o?
www.laptopmag.com: From $250 AI subscriptions to futuristic glasses and search that talks back, hereâ€™s what people are saying about Tuesday's Google I/O.
www.tomsguide.com: Googleâ€™s Gemini AI can now access Gmail, Docs, Drive, and more to deliver personalized help â€” but it raises new privacy concerns.
Data Phoenix: Google updated its model lineup and introduced a 'Deep Think' reasoning mode for Gemini 2.5 Pro
Maginative: Googleâ€™s revamped Canvas, powered by the Gemini 2.5 Pro model, lets you turn ideas into apps, quizzes, podcasts, and visuals in secondsâ€”no code required.
Tech News | Euronews RSS: The tech giant is introducing a new "AI mode" that will embed chatbot capabilities into its search engine to keep up with rivals like OpenAI's ChatGPT.
learn.aisingapore.org: Advancing Gemini’s security safeguards – Google DeepMind
Data Phoenix: Google has launched major Gemini updates, including free visual assistance via Gemini Live, new subscription tiers starting at $19.99/month, advanced creative tools like Veo 3 for video generation with native audio, and an upcoming autonomous Agent Mode for complex task management.
Latest news: Everything from Google I/O 2025 you might've missed: Gemini, smart glasses, and more
thetechbasic.com: Google now adds ads to AI Mode and AI Overviews in search
Google DeepMind Blog: Gemini 2.5: Our most intelligent models are getting even better

Last Week@Last Week in AI //

Anthropic's Claude Integrations Enhance AI Reasoning and Security - Anthropic is enhancing its Claude AI model through new integrations, security measures, and red team reviews to ensure safety and address malicious uses, focusing on vulnerabilities and improving functionality with app connections.

References: TestingCatalog , techcrunch.com

Anthropic is enhancing its Claude AI model through new integrations and security measures. A new Claude Neptune model is undergoing internal red team reviews to probe its robustness against jailbreaking and ensure its safety protocols are effective. The red team exercises are set to run until May 18, focusing particularly on vulnerabilities in the constitutional classifiers that underpin Anthropic’s safety measures, suggesting that the model is more capable and sensitive, requiring more stringent pre-release testing.

Anthropic has also launched a new feature allowing users to connect more apps to Claude, enhancing its functionality and integration with various tools. This new app connection feature, called Integrations, is available in beta for subscribers to Anthropic’s Claude Max, Team, and Enterprise plans, and soon Pro. It builds on the company's MCP protocol, enabling Claude to draw data from business tools, content repositories, and app development environments, allowing users to connect their tools to Claude, and gain deep context about their work.

Anthropic is also addressing the malicious uses of its Claude models, with a report outlining case studies on how threat actors have misused the models and the steps taken to detect and counter such misuse. One notable case involved an influence-as-a-service operation that used Claude to orchestrate social media bot accounts, deciding when to comment, like, or re-share posts. Anthropic has also observed cases of credential stuffing operations, recruitment fraud campaigns, and AI-enhanced malware generation, reinforcing the importance of ongoing security measures and sharing learnings with the wider AI ecosystem.

Recommended read:

Top link: Last Week in AI
Permalink: More details

References :

TestingCatalog: New Claude Neptune model undergoes red team review at Anthropic
techcrunch.com: Anthropic lets you connect apps to Claude

Coen van@Techzine Global //

ServiceNow Launches AI Control Tower for Management - ServiceNow has introduced AI Control Tower, a centralized control center for managing AI agents, models, and workflows, along with AI Agent Fabric and the Apriel Nemotron 15B model in partnership with Nvidia.

References: thenewstack.io , AI News | VentureBeat , NVIDIA Blog ...

ServiceNow has announced the launch of AI Control Tower, a centralized control center designed to manage, secure, and optimize AI agents, models, and workflows across an organization. Unveiled at Knowledge 2025 in Las Vegas, this platform provides a holistic view of the entire AI ecosystem, enabling enterprises to monitor and manage both ServiceNow and third-party AI agents from a single location. The AI Control Tower aims to address the growing complexity of managing AI deployments, giving users a central point to see all AI systems, their deployment status, and ensuring governance and understanding of their activities.

The AI Control Tower offers key benefits such as enterprise-wide AI visibility, built-in compliance and AI governance, end-to-end lifecycle management of agentic processes, real-time reporting, and improved alignment. It is designed to help AI systems administrators and other stakeholders monitor and manage every AI agent, model, or workflow within their system, providing real-time reporting for different metrics and embedded compliance and AI governance. The platform helps users understand the different systems by provider and type, improving risk and compliance management.

In addition to the AI Control Tower, ServiceNow introduced AI Agent Fabric, facilitating communication between AI agents and partner integrations. ServiceNow has also partnered with NVIDIA to engineer an open-source model, Apriel Nemotron 15B, designed to drive advancements in enterprise large language models (LLMs) and power AI agents that support various enterprise workflows. The Apriel Nemotron 15B, developed using NVIDIA NeMo and ServiceNow domain-specific data, is engineered for reasoning, drawing inferences, weighing goals, and navigating rules in real time, making it efficient and scalable for concurrent enterprise workflows.

Recommended read:

Top link: Techzine Global
Permalink: More details

References :

thenewstack.io: Given that ServiceNow is, at its core, all about automating workflows for enterprises, itâ€™s no surprise that
AI News | VentureBeat: ServiceNow also announced a way for agents to communicate with others along with its new observability platform.
Techzine Global: During Knowledge 2025 , ServiceNow launched AI Control Tower, a centralized control center for managing, securing, and optimizing AI agents, models, and workflows.
NVIDIA Blog: Your Service Teams Just Got a New Coworker â€” and Itâ€™s a 15B-Parameter Super Genius Built by ServiceNow and NVIDIA
Latest news: ServiceNow and Nvidia's new reasoning AI model raises the bar for enterprise AI agents
www.networkworld.com: ServiceNow unveiled a centralized command center the company says will enable enterprise customers to govern, manage, and secure AI agents from ServiceNow and other third-parties from a unified platform.
www.computerworld.com: Nvidia and ServiceNow have created an AI model that can help companies create learning AI agents to automate corporate workloads. The open-source Apriel model, available generally in the second quarter on HuggingFace, will help create AI agents that can make decisions around IT, human resources and customer-service functions.
blogs.nvidia.com: ServiceNow is accelerating enterprise AI with a new reasoning model built in partnership with NVIDIA â€” enabling AI agents that respond in real time, handle complex workflows and scale functions like IT, HR and customer service teams worldwide.
NVIDIA Newsroom: ServiceNow is accelerating enterprise AI with a new reasoning model built in partnership with NVIDIA â€” enabling AI agents that respond in real time, handle complex workflows and scale functions like IT, HR and customer service teams worldwide.
techstrong.ai: ServiceNow Inc. kicked off its annual artificial intelligence (AI) conference in Las Vegas Tuesday as it has in previous years -- with a fusillade of product announcements, partnerships and customer stories.
techstrong.ai: ServiceNowâ€™s New AI Control Tower Commands AI Agents
Ken Yeung: ServiceNow Debuts AI Control Tower to Manage the Chaos of Enterprise AI Agents
Ken Yeung: ServiceNow and Nvidia have had a long-standing partnership building generative AI solutions for the enterprise. This week, at ServiceNowâ€™s Knowledge customer conference, the two are introducing the latest fruits of their labor, a new large language model called Apriel Nemotron 15B with reasoning capabilities.
CIO Dive - Latest News: ServiceNow, Nvidia develop LLM to fuel enterprise agents
AI News: ServiceNow bets on unified AI to untangle enterprise complexity
www.artificialintelligence-news.com: ServiceNow bets on unified AI to untangle enterprise complexity
www.marktechpost.com: ServiceNow AI Released Apriel-Nemotron-15b-Thinker: A Compact Yet Powerful Reasoning Model Optimized for Enterprise-Scale Deployment and Efficiency

erichs211@gmail.com (Eric@techradar.com //

Google Gemini 2.5 Pro Beats PokÃ©mon with AI Advancements - Google's Gemini 2.5 Pro AI model successfully completed Pokémon Blue with external help, demonstrating the AI's reasoning and problem-solving capabilities; Google is also testing an AI Mode that transforms Search into a Gemini-powered chatbot.

References: the-decoder.com , thetechbasic.com , THE DECODER ...

Google's powerful AI model, Gemini 2.5 Pro, has achieved a significant milestone by completing the classic Game Boy game Pokémon Blue. This accomplishment, spearheaded by software engineer Joel Z, demonstrates the AI's enhanced reasoning and problem-solving abilities. Google CEO Sundar Pichai celebrated the achievement online, highlighting it as a substantial win for AI development. The project showcases how AI can learn to handle complex tasks, requiring long-term planning, goal tracking, and visual navigation, which are vital components in the pursuit of general artificial intelligence.

Joel Z facilitated Gemini's gameplay over several months, livestreaming the AI's progress. While Joel is not affiliated with Google, his efforts were supported by the company's leadership. To enable Gemini to navigate the game, Joel used an emulator, mGBA, to feed screenshots and game data, like character position and map layout. He also incorporated smaller AI helpers, like a "Pathfinder" and a "Boulder Puzzle Solver," to tackle particularly challenging segments. These sub-agents, also versions of Gemini, were deployed strategically by the AI to manage complex situations, showcasing its ability to differentiate between routine and complicated tasks.

Google is also experimenting with transforming its search engine into a Gemini-powered chatbot via an AI Mode. This new feature, currently being tested with a small percentage of U.S. users, delivers conversational answers generated from Google's vast index, effectively turning Search into an answer engine. Instead of a list of links, AI Mode provides rich, visual summaries and remembers prior queries, directly competing with the search features of Perplexity and ChatGPT. While this shift could potentially impact organic SEO tactics, it signifies Google's commitment to integrating AI more deeply into its core products, offering users a more intuitive and informative search experience.

Recommended read:

Top link: techradar.com
Permalink: More details

References :

thetechbasic.com: Googleâ€™s powerful AI model, Gemini 2.5 Pro, has finished playing the old Game Boy gameÂ PokÃ©mon Blue.
TestingCatalog: Discusses Google's upcoming Gemini web updates, including new tools like Memory, Veo 2, and the Gemini Ultra tier.

News from the AI & ML world

DeeperML - #reasoning

Apple Research Questioning LLMs Ability in Reasoning - Apple research challenges the reasoning capabilities of Large Reasoning Models (LRMs), suggesting they struggle with basic reasoning tasks, sparking debate within the AI community where Google is on the other side of the debate.

Apple Research Paper on AI Reasoning Faces Criticism - Apple researchers published a paper questioning the reasoning abilities of LLMs, arguing they rely on pattern matching rather than true reasoning, but critics argue that the experiments were unfairly designed.

Mistral AI Launches Magistral Reasoning Models Openly - Mistral AI launched its first reasoning model, Magistral, available in both large and small Apache 2.0 versions, and introduced Mistral Agents API’s Handoffs feature for smart, multi-agent workflows.

OpenAI Launches o3-pro Reasoning Model - OpenAI released o3-pro, a smarter model with enhanced capabilities in math, science, and programming, available to ChatGPT Pro and Team subscribers and through OpenAI's API, with a price cut of 87%.

Mistral AI Launches First Reasoning Model Magistral - Mistral AI released Magistral, its first reasoning LLM, in two versions: Magistral Small (24B parameters, open weights) and Magistral Medium (API-only), promoting it for traceable reasoning and creative writing.

DeepSeek R1-0528 Excels in Math and Reasoning - DeepSeek R1-0528 is a new reasoning model outperforming rivals in math and reasoning, attributed to architecture and training data, posing a challenge to established closed models.

Meta's AI Accuracy Study and Military Tech Collaboration - Meta partners with Anduril to build AR/VR devices for the military to enhance soldiers' situational awareness and AI accuracy improves by 34% with shorter reasoning chains.

DeepSeek Updates R1 Model, Boosts Reasoning Capability - DeepSeek has released R1-0528, an updated open-source reasoning AI model, enhancing its capabilities and positioning itself as a strong open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

DeepSeek R1-0528 Update Improves Reasoning and Coding - DeepSeek released DeepSeek-R1-0528, with improved performance in math, coding, and general reasoning and is a competitive open-source alternative to models like OpenAI’s o3 and Google’s Gemini 2.5 Pro.

Anthropic's Claude Integrations Enhance AI Reasoning and Security - Anthropic is enhancing its Claude AI model through new integrations, security measures, and red team reviews to ensure safety and address malicious uses, focusing on vulnerabilities and improving functionality with app connections.

ServiceNow Launches AI Control Tower for Management - ServiceNow has introduced AI Control Tower, a centralized control center for managing AI agents, models, and workflows, along with AI Agent Fabric and the Apriel Nemotron 15B model in partnership with Nvidia.

Benchmarks

Blogs

Research Tools