Carl Franzen@AI News | VentureBeat
//
Mistral AI has launched its first reasoning model, Magistral, signaling a commitment to open-source AI development. The Magistral family features two models: Magistral Small, a 24-billion parameter model available with open weights under the Apache 2.0 license, and Magistral Medium, a proprietary model accessible through an API. This dual release strategy aims to cater to both enterprise clients seeking advanced reasoning capabilities and the broader AI community interested in open-source innovation.
Mistral's decision to release Magistral Small under the permissive Apache 2.0 license marks a significant return to its open-source roots. The license allows for the free use, modification, and distribution of the model's source code, even for commercial purposes. This empowers startups and established companies to build and deploy their own applications on top of Mistral’s latest reasoning architecture, without the burdens of licensing fees or vendor lock-in. The release serves as a powerful counter-narrative, reaffirming Mistral’s dedication to arming the open community with cutting-edge tools.
Magistral Medium demonstrates competitive performance in the reasoning arena, according to internal benchmarks released by Mistral. The model was tested against its predecessor, Mistral-Medium 3, and models from Deepseek. Furthermore, Mistral's Agents API's Handoffs feature facilitates smart, multi-agent workflows, allowing different agents to collaborate on complex tasks. This enables modular and efficient problem-solving, as demonstrated in systems where agents collaborate to answer inflation-related questions.
Recommended read:
References :
- Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium.
- Simon Willison's Weblog: Mistral's first reasoning model is out today, in two sizes. There's a 24B Apache 2 licensed open-weights model called Magistral Small (actually Magistral-Small-2506), and a larger API-only model called Magistral Medium.
- THE DECODER: Mistral launches Europe's first reasoning model Magistral but lags behind competitors
- AI News | VentureBeat: The company is signaling that the future of reasoning AI will be both powerful and, in a meaningful way, open to all.
- www.marktechpost.com: How to Create Smart Multi-Agent Workflows Using the Mistral Agents API’s Handoffs Feature
- TestingCatalog: Mistral AI debuts Magistral models focused on advanced reasoning
- www.artificialintelligence-news.com: Mistral AI has pulled back the curtain on Magistral, their first model specifically built for reasoning tasks.
- www.infoworld.com: Mistral AI unveils Magistral reasoning model
- AI News: Mistral AI has pulled back the curtain on Magistral, their first model specifically built for reasoning tasks.
- the-decoder.com: The French start-up Mistral is launching its first reasoning model on the market with Magistral. It is designed to enable logical thinking in European languages.
- Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium. My notes here, including running Small locally with Ollama and accessing Medium via my llm-mistral plugin
- SiliconANGLE: Mistral AI debuts new Magistral series of reasoning LLMs.
- siliconangle.com: Mistral AI debuts new Magistral series of reasoning LLMs
- MarkTechPost: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
- www.marktechpost.com: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
- WhatIs: What differentiates Mistral AI reasoning model Magistral
- AlternativeTo: Mistral AI debuts Magistral: a transparent, multilingual reasoning model family, including open-source Magistral Small available on Hugging Face and enterprise-focused Magistral Medium available on various platforms.
Carl Franzen@AI News | VentureBeat
//
Mistral AI has launched Magistral, its inaugural reasoning large language model (LLM), available in two distinct versions. Magistral Small, a 24 billion parameter model, is offered with open weights under the Apache 2.0 license, enabling developers to freely use, modify, and distribute the code for commercial or non-commercial purposes. This model can be run locally using tools like Ollama. The other version, Magistral Medium, is accessible exclusively via Mistral’s API and is tailored for enterprise clients, providing traceable reasoning capabilities crucial for compliance in highly regulated sectors such as legal, financial, healthcare, and government.
Mistral is positioning Magistral as a powerful tool for both professional and creative applications. The company highlights Magistral's ability to perform "transparent, multilingual reasoning," making it suitable for tasks involving complex calculations, programming logic, decision trees, and rule-based systems. Additionally, Mistral is promoting Magistral for creative writing, touting its capacity to generate coherent or, if desired, uniquely eccentric content. Users can experiment with Magistral Medium through the "Thinking" mode within Mistral's Le Chat platform, with options for "Pure Thinking" and a high-speed "10x speed" mode powered by Cerebras.
Benchmark tests reveal that Magistral Medium is competitive in the reasoning arena. On the AIME-24 mathematics benchmark, the model achieved an impressive 73.6% accuracy, comparable to its predecessor, Mistral Medium 3, and outperforming Deepseek's models. Mistral's strategic release of Magistral Small under the Apache 2.0 license is seen as a reaffirmation of its commitment to open source principles. This move contrasts with the company's previous release of Medium 3 as a proprietary offering, which had raised concerns about a shift towards a more closed ecosystem.
Recommended read:
References :
- AI News | VentureBeat: Mistrals first reasoning model, Magistral, launches with large and small Apache 2.0 version.
- Simon Willison: Mistral's first reasoning LLM - Magistral - was released today and is available in two sizes, an open weights (Apache 2) 24B model called Magistral Small and an API/hosted only model called Magistral Medium. My notes here, including running Small locally with Ollama and accessing Medium via my llm-mistral plugin
- Simon Willison's Weblog: Magistral — the first reasoning model by Mistral AI
- the-decoder.com: Mistral launches Europe's first reasoning model Magistral but lags behind competitors
- SiliconANGLE: Mistral AI debuts new Magistral series of reasoning LLMs
- MarkTechPost: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
- TestingCatalog: Mistral AI debuts Magistral models focused on advanced reasoning
- siliconangle.com: Mistral AI SAS today introduced Magistral, a new lineup of reasoning-optimized large language models. The LLM series includes two algorithms on launch.
- www.artificialintelligence-news.com: Mistral AI challenges big tech with reasoning model
- www.marktechpost.com: Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications
- WhatIs: What differentiates Mistral AI reasoning model Magistral
anket.sah@lambda.ai (Anket@lambdalabs.com
//
DeepSeek's latest model, R1-0528, is now available on Lambda’s Inference API, marking an upgrade to the original R1 model released in January 2025. The new model, built upon the deepseek_v3 architecture, boasts a blend of mathematical capabilities, code generation finesse, and reasoning depth, aiming to challenge the dominance of OpenAI’s o3 and Google’s Gemini 2.5 Pro. DeepSeek-R1-0528 employs FP8 quantization, enhancing its ability to handle complex computations efficiently and features a mixture-of-experts (MoE) model with multi-headed latent attention (MLA) and multi-token prediction (MTP), enabling efficient handling of complex reasoning tasks.
DeepSeek-R1-0528, while a solid upgrade, didn't generate the same excitement as the initial R1 release. When R1 was released in January 2025, it was seen as a watershed moment for the company. This time around, it's considered a solid model for its price and status as an open model, and is best suited for tasks that align with its specific strengths. The initial DeepSeek release created a "DeepSeek moment", leading to market reactions and comparisons to other models. The first R1 model was released with a free app featuring a clear design and visible chain-of-thought, which forced other labs to follow suit.
While DeepSeek R1-0528 offers advantages, experts warn of potential risks associated with open-source AI models. Cisco issued a report shortly after R1 began dominating headlines which claimed DeepSeek failed to block a single harmful prompt when tested against 50 random prompts taken from the HarmBench dataset. These risks include potential misuse for cyber threats, spread of misinformation, and reinforcement of biases. There are concerns regarding data poisoning, where compromised training data could lead to biased or disinformation. Furthermore, adversaries could modify the models to bypass controls, generate harmful content, or embed backdoors for exploitation.
Recommended read:
@www.marktechpost.com
//
DeepSeek, a Chinese AI startup, has launched an updated version of its R1 reasoning AI model, named DeepSeek-R1-0528. This new iteration brings the open-source model near parity with proprietary paid models like OpenAI’s o3 and Google’s Gemini 2.5 Pro in terms of reasoning capabilities. The model is released under the permissive MIT License, enabling commercial use and customization, marking a commitment to open-source AI development. The model's weights and documentation are available on Hugging Face, facilitating local deployment and API integration.
The DeepSeek-R1-0528 update introduces substantial enhancements in the model's ability to handle complex reasoning tasks across various domains, including mathematics, science, business, and programming. DeepSeek attributes these improvements to leveraging increased computational resources and applying algorithmic optimizations in post-training. Notably, the accuracy on the AIME 2025 test has surged from 70% to 87.5%, demonstrating deeper reasoning processes with an average of 23,000 tokens per question, compared to the previous version's 12,000 tokens.
Alongside enhanced reasoning, the updated R1 model boasts a reduced hallucination rate, which contributes to more reliable and consistent output. Code generation performance has also seen a boost, positioning it as a strong contender in the open-source AI landscape. DeepSeek provides instructions on its GitHub repository for those interested in running the model locally and encourages community feedback and questions. The company aims to provide accessible AI solutions, underscored by the availability of a distilled version of R1-0528, DeepSeek-R1-0528-Qwen3-8B, designed for efficient single-GPU operation.
Recommended read:
References :
- pub.towardsai.net: DeepSeek R1 : Is It Right For You? (A Practical Self‑Assessment for Businesses and Individuals)
- AI News | VentureBeat: DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro
- Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
- Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
- MacStories: Details about DeepSeek's R1-0528 model and its improved performance.
- MarkTechPost: Information about DeepSeek's R1-0528 model and its enhancements in math and code performance.
- www.marktechpost.com: DeepSeek, the Chinese AI Unicorn, has released an updated version of its R1 reasoning model, named DeepSeek-R1-0528. This release enhances the model’s capabilities in mathematics, programming, and general logical reasoning, positioning it as a formidable open-source alternative to leading models like OpenAI’s o3 and Google’s Gemini 2.5 Pro. Technical Enhancements The R1-0528 update introduces significant […]
- www.analyticsvidhya.com: When DeepSeek R1 launched in January, it instantly became one of the most talked-about open-source models on the scene, gaining popularity for its sharp reasoning and impressive performance. Fast-forward to today, and DeepSeek is back with a so-called “minor trial upgradeâ€, but don’t let the modest name fool you. DeepSeek-R1-0528 delivers major leaps in reasoning, […]
- : The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
- Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
- TheSequence: The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
- Fello AI: In late May 2025, Chinese startup DeepSeek quietly rolled out R1-0528, a beefed-up version of its open-source R1 reasoning model.
- felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAI’s o3 & Gemini 2.5 Pro
@www.marktechpost.com
//
DeepSeek has released a major update to its R1 reasoning model, dubbed DeepSeek-R1-0528, marking a significant step forward in open-source AI. The update boasts enhanced performance in complex reasoning, mathematics, and coding, positioning it as a strong competitor to leading commercial models like OpenAI's o3 and Google's Gemini 2.5 Pro. The model's weights, training recipes, and comprehensive documentation are openly available under the MIT license, fostering transparency and community-driven innovation. This release allows researchers, developers, and businesses to access cutting-edge AI capabilities without the constraints of closed ecosystems or expensive subscriptions.
The DeepSeek-R1-0528 update brings several core improvements. The model's parameter count has increased from 671 billion to 685 billion, enabling it to process and store more intricate patterns. Enhanced chain-of-thought layers deepen the model's reasoning capabilities, making it more reliable in handling multi-step logic problems. Post-training optimizations have also been applied to reduce hallucinations and improve output stability. In practical terms, the update introduces JSON outputs, native function calling, and simplified system prompts, all designed to streamline real-world deployment and enhance the developer experience.
Specifically, DeepSeek R1-0528 demonstrates a remarkable leap in mathematical reasoning. On the AIME 2025 test, its accuracy improved from 70% to an impressive 87.5%, rivaling OpenAI's o3. This improvement is attributed to "enhanced thinking depth," with the model now utilizing significantly more tokens per question, indicating more thorough and systematic logical analysis. The open-source nature of DeepSeek-R1-0528 empowers users to fine-tune and adapt the model to their specific needs, fostering further innovation and advancements within the AI community.
Recommended read:
References :
- Kyle Wiggers ?: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
- AI News | VentureBeat: VentureBeat article on DeepSeek R1-0528.
- Analytics Vidhya: New Deepseek R1-0528 Update is INSANE
- MacStories: Testing DeepSeek R1-0528 on the M3 Ultra Mac Studio and Installing Local GGUF Models with Ollama on macOS
- www.analyticsvidhya.com: New Deepseek R1-0528 Update is INSANE
- www.marktechpost.com: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
- NextBigFuture.com: DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training.
- MarkTechPost: DeepSeek Releases R1-0528: An Open-Source Reasoning AI Model Delivering Enhanced Math and Code Performance with Single-GPU Efficiency
- : In the early hours of May 29, Chinese AI startup DeepSeek quietly open-sourced the latest iteration of its R1 large language model, DeepSeek-R1-0528, on the Hugging Face platform .
- www.computerworld.com: Reports that DeepSeek releases a new version of its R1 reasoning AI model.
- techcrunch.com: DeepSeek updates its R1 reasoning AI model, releases it on Hugging Face
- the-decoder.com: Deepseek's R1 model closes the gap with OpenAI and Google after major update
- Simon Willison: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
- Analytics India Magazine: The new DeepSeek-R1 Is as good as OpenAI o3 and Gemini 2.5 Pro
- : The 'Minor Upgrade' That's Anything But: DeepSeek R1-0528 Deep Dive
- simonwillison.net: Some notes on the new DeepSeek-R1-0528 - a completely different model from the R1 they released in January, despite having a very similar name Terrible LLM naming has managed to infect the Chinese AI labs too
- TheSequence: This article provides an overview of the new DeepSeek R1-0528 model and notes its improvements over the prior model released in January.
- Kyle Wiggers ?: News about the release of DeepSeek's updated R1 AI model, emphasizing its increased censorship.
- Fello AI: Reports that the R1-0528 model from DeepSeek is matching the capabilities of OpenAI's o3 and Google's Gemini 2.5 Pro.
- felloai.com: Latest DeepSeek Update Called R1-0528 Is Matching OpenAI’s o3 & Gemini 2.5 Pro
- www.tomsguide.com: DeepSeek’s latest update is a serious threat to ChatGPT and Google — here’s why
@www.analyticsvidhya.com
//
OpenAI recently unveiled its groundbreaking o3 and o4-mini AI models, representing a significant leap in visual problem-solving and tool-using artificial intelligence. These models can manipulate and reason with images, integrating them directly into their problem-solving process. This unlocks a new class of problem-solving that blends visual and textual reasoning, allowing the AI to not just see an image, but to "think with it." The models can also autonomously utilize various tools within ChatGPT, such as web search, code execution, file analysis, and image generation, all within a single task flow.
These models are designed to improve coding capabilities, and the GPT-4.1 series includes GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. GPT-4.1 demonstrates enhanced performance and lower prices, achieving a 54.6% score on SWE-bench Verified, a significant 21.4 percentage point increase from GPT-4o. This is a big gain in practical software engineering capabilities. Most notably, GPT-4.1 offers up to one million tokens of input context, compared to GPT-4o's 128k tokens, making it suitable for processing large codebases and extensive documentation. GPT-4.1 mini and nano also offer performance boosts at reduced latency and cost.
The new models are available to ChatGPT Plus, Pro, and Team users, with Enterprise and education users gaining access soon. While reasoning alone isn't a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks. With Deep Research products and o3/o4-mini, AI-assisted search-based research is now effective.
Recommended read:
References :
- bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
- TestingCatalog: OpenAI’s o3 and o4‑mini bring smarter tools and faster reasoning to ChatGPT
- thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. These models feel incredibly smart.
- venturebeat.com: OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major advance in visual problem-solving and tool-using artificial intelligence.
- www.techrepublic.com: OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will get access next week.
- the-decoder.com: OpenAI's o3 achieves near-perfect performance on long context benchmark
- the-decoder.com: Safety assessments show that OpenAI's o3 is probably the company's riskiest AI model to date
- www.unite.ai: Inside OpenAI’s o3 and o4‑mini: Unlocking New Possibilities Through Multimodal Reasoning and Integrated Toolsets
- thezvi.wordpress.com: Discusses the release of OpenAI's o3 and o4-mini reasoning models and their enhanced capabilities.
- Simon Willison's Weblog: OpenAI o3 and o4-mini System Card
- Interconnects: OpenAI's o3: Over-optimization is back and weirder than ever. Tools, true rewards, and a new direction for language models.
- techstrong.ai: Nobody’s Perfect: OpenAI o3, o4 Reasoning Models Have Some Kinks
- bsky.app: It's been a couple of years since GPT-4 powered Bing, but with the various Deep Research products and now o3/o4-mini I'm ready to say that AI assisted search-based research actually works now
- www.analyticsvidhya.com: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
- pub.towardsai.net: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia Nemotron-H) Also, Grok-3 Mini Shakes Up Cost Efficiency, Codex, Cohere Embed 4, PerceptionLM & more.
- Last Week in AI: Last Week in AI #307 - GPT 4.1, o3, o4-mini, Gemini 2.5 Flash, Veo 2
- composio.dev: OpenAI o3 vs. Gemini 2. 5 Pro vs. o4-mini
- Towards AI: Details about Open AI's Agentic O3 models
@www.analyticsvidhya.com
//
OpenAI has launched its latest AI models, o3 and o4-mini, marking a significant upgrade in the company's offerings. According to Greg Brockman of OpenAI, these models "feel incredibly smart" and are already demonstrating potential in generating novel ideas for top scientists. These models are designed to provide better access to tools and enhance the ability to discern when to use them, ultimately delivering more practical value. The focus is on effective tool utilization, stringing tasks together, and maintaining persistence, which are key strengths of the o3 model. Sam Altman has announced the forthcoming release of o3-pro to the pro tier in the coming weeks.
The o3 model, in particular, is highlighted for its capabilities and tool use. It excels in scenarios requiring image generation, with or without reasoning, provided o3 has the necessary tools. It has been used to answer questions and helps with writing its own review. However, concerns have been raised regarding the naming conventions, with confusion surrounding the relationship between models like 4o-mini, o4-mini, and o4-mini-high. Usage is limited to 50 queries a week for o3, 150 a day for o4-mini and 50 a day for o4-mini-high for plus users, with 10 Deep Research queries per month.
The o3 and o4-mini models bring smarter tools and faster reasoning to ChatGPT, allowing the models to decide when to invoke web search, Python, file analysis, or image generation tools, finishing multi-step tasks in under a minute. The models can also "think with images," accepting sketches or screenshots and adjusting them during reasoning. O3 sets new highs on Codeforces and SWE‑bench and makes 20% fewer major errors than o1, while the leaner o4‑mini scores 99.5% pass‑at‑1 on the 2025 AIME with tools and offers higher rate limits. The new features are currently available to Pro and Team subscribers, with Enterprise, Education tiers and API access to follow soon.
Recommended read:
References :
- thezvi.wordpress.com: Thezvi WordPress post discussing OpenAI's o3 and o4-mini models.
- Interconnects: Interconnects.ai article about OpenAI's o3 over-optimization.
- TestingCatalog: testingcatalog.com article about OpenAI's o3 and o4-mini bringing smarter tools and faster reasoning to ChatGPT
- thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. Greg Brockman (OpenAI): Just released o3 and o4-mini! These models feel incredibly smart. We’ve heard from top scientists that they produce useful novel ideas. Excited to see their …
- www.techrepublic.com: OpenAI’s New AI Models o3 and o4-mini Can Now ‘Think With Images’
- bdtechtalks.com: OpenAI's new reasoning models, o3 and o4-mini, enhance problem-solving capabilities and tool use, making them more effective than their predecessors.
- bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
- THE DECODER: OpenAI has made the Deep Research tool in ChatGPT available to free-tier users. Access is limited to five uses per month, using a lightweight version based on the o4-mini-model.
- TestingCatalog: OpenAI may have increased the o3 model's quota to 50 messages/day and added task-scheduling to o3 and o4 Mini. An "o3 Pro" tier might be on the horizon.
- www.tomsguide.com: OpenAI has unveiled a cheaper but less effective deep research tool that'll be used across tiers and also available for free.
- TestingCatalog: OpenAI tests Deep Research Mini tool for free ChatGPT users
@www.analyticsvidhya.com
//
OpenAI has recently launched its o3 and o4-mini models, marking a shift towards AI agents with enhanced tool-use capabilities. These models are specifically designed to excel in areas such as web search, code interpretation, and memory utilization, leveraging reinforcement learning to optimize their performance. The focus is on creating AI that can intelligently use tools in a loop, behaving more like a streamlined and rapid-response system for complex tasks. The development underscores a growing industry trend of major AI labs delivering inference-optimized models ready for immediate deployment.
The o3 model stands out for its ability to provide quick answers, often within 30 seconds to three minutes, a significant improvement over the longer response times of previous models. This speed is coupled with integrated tool use, making it suitable for real-world applications requiring quick, actionable insights. Another key advantage of o3 is its capability to manipulate image inputs using code, allowing it to identify key features by cropping and zooming, which has been demonstrated in tasks such as the "GeoGuessr" game.
While o3 demonstrates strengths across various benchmarks, tests have also shown variances in performance compared to other models like Gemini 2.5 and even its smaller counterpart, o4-mini. While o3 leads on most benchmarks and set a new state-of-the-art with 79.60% on the Aider polyglot coding benchmark, the costs are much higher. However, when used as a planner and GPT-4.1, the pair scored a new SOTA with 83% at 65% of the cost, though still expensive. One analysis notes the importance of context awareness when iterating on code, which Gemini 2.5 seems to handle better than o3 and o4-mini. Overall, the models represent OpenAI's continued push towards more efficient and agentic AI systems.
Recommended read:
References :
- bdtechtalks.com: OpenAI's new reasoning models, o3 and o4-mini, enhance problem-solving capabilities and tool use, making them more effective than their predecessors.
- Data Phoenix: OpenAI has launched o3 and o4-mini, which combine sophisticated reasoning capabilities with comprehensive tool integration.
- THE DECODER: OpenAI's new language model o3 shows concrete signs of deception, manipulation and sabotage behavior for the first time.
- thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini.
- Simon Willison's Weblog: I'm surprised to see a combined System Card for o3 and o4-mini in the same document - I'd expect to see these covered separately. The opening paragraph calls out the most interesting new ability of these models (see also
- techstrong.ai: Nobody’s Perfect: OpenAI o3, o4 Reasoning Models Have Some Kinks
- Analytics Vidhya: OpenAI's o3 and o4-mini models have advanced reasoning capabilities. They have demonstrated success in problem-solving tasks in various areas, from mathematics to coding, with results showing potential advantages in efficiency and capabilities compared to prior generations.
- pub.towardsai.net: Louie Peters analyzes OpenAI's o3, DeepMind's Gemma, and Nvidia's Nemotron-H, focusing on inference-optimized open-weight models.
- Towards AI: Towards AI Editorial Team on OpenAI's o3 and o4-mini models, emphasizing tool use and agentic capabilities.
- composio.dev: OpenAI o3 vs. Gemini 2.5 Pro vs. o4-mini
Chris McKay@Maginative
//
OpenAI has released its latest AI models, o3 and o4-mini, designed to enhance reasoning and tool use within ChatGPT. These models aim to provide users with smarter and faster AI experiences by leveraging web search, Python programming, visual analysis, and image generation. The models are designed to solve complex problems and perform tasks more efficiently, positioning OpenAI competitively in the rapidly evolving AI landscape. Greg Brockman from OpenAI noted the models "feel incredibly smart" and have the potential to positively impact daily life and solve challenging problems.
The o3 model stands out due to its ability to use tools independently, which enables more practical applications. The model determines when and how to utilize tools such as web search, file analysis, and image generation, thus reducing the need for users to specify tool usage with each query. The o3 model sets new standards for reasoning, particularly in coding, mathematics, and visual perception, and has achieved state-of-the-art performance on several competition benchmarks. The model excels in programming, business, consulting, and creative ideation.
Usage limits for these models vary, with o3 at 50 queries per week, and o4-mini at 150 queries per day, and o4-mini-high at 50 queries per day for Plus users, alongside 10 Deep Research queries per month. The o3 model is available to ChatGPT Pro and Team subscribers, while the o4-mini models are used across ChatGPT Plus. OpenAI says o3 is also beneficial in generating and critically evaluating novel hypotheses, especially in biology, mathematics, and engineering contexts.
Recommended read:
References :
- Simon Willison's Weblog: OpenAI are really emphasizing tool use with these: For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems.
- the-decoder.com: OpenAI’s new o3 and o4-mini models reason with images and tools
- venturebeat.com: OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously
- www.analyticsvidhya.com: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
- www.tomsguide.com: OpenAI's o3 and o4-mini models
- Maginative: OpenAI’s latest models—o3 and o4-mini—introduce agentic reasoning, full tool integration, and multimodal thinking, setting a new bar for AI performance in both speed and sophistication.
- THE DECODER: OpenAI’s new o3 and o4-mini models reason with images and tools
- Analytics Vidhya: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
- www.zdnet.com: These new models are the first to independently use all ChatGPT tools.
- The Tech Basic: OpenAI recently released its new AI models, o3 and o4-mini, to the public. Smart tools employ pictures to address problems through pictures, including sketch interpretation and photo restoration.
- thetechbasic.com: OpenAI’s new AI Can “See†and Solve Problems with Pictures
- www.marktechpost.com: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
- MarkTechPost: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
- analyticsindiamag.com: Access to o3 and o4-mini is rolling out today for ChatGPT Plus, Pro, and Team users.
- THE DECODER: OpenAI is expanding its o-series with two new language models featuring improved tool usage and strong performance on complex tasks.
- gHacks Technology News: OpenAI released its latest models, o3 and o4-mini, to enhance the performance and speed of ChatGPT in reasoning tasks.
- www.ghacks.net: OpenAI Launches o3 and o4-Mini models to improve ChatGPT's reasoning abilities
- Data Phoenix: OpenAI releases new reasoning models o3 and o4-mini amid intense competition. OpenAI has launched o3 and o4-mini, which combine sophisticated reasoning capabilities with comprehensive tool integration.
- Shelly Palmer: OpenAI Quietly Reshapes the Landscape with o3 and o4-mini. OpenAI just rolled out a major update to ChatGPT, quietly releasing three new models (o3, o4-mini, and o4-mini-high) that offer the most advanced reasoning capabilities the company has ever shipped.
- THE DECODER: Safety assessments show that OpenAI's o3 is probably the company's riskiest AI model to date
- shellypalmer.com: OpenAI Quietly Reshapes the Landscape with o3 and o4-mini
- BleepingComputer: OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits
- TestingCatalog: OpenAI’s o3 and o4‑mini bring smarter tools and faster reasoning to ChatGPT
- simonwillison.net: Introducing OpenAI o3 and o4-mini
- bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
- bdtechtalks.com: What to know about o3 and o4-mini, OpenAI’s new reasoning models
- thezvi.wordpress.com: OpenAI has finally introduced us to the full o3 along with o4-mini. Greg Brockman (OpenAI): Just released o3 and o4-mini! These models feel incredibly smart. We’ve heard from top scientists that they produce useful novel ideas. Excited to see their …
- thezvi.wordpress.com: OpenAI has upgraded its entire suite of models. By all reports, they are back in the game for more than images. GPT-4.1 and especially GPT-4.1-mini are their new API non-reasoning models.
- felloai.com: OpenAI has just launched a brand-new series of GPT models—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—that promise major advances in coding, instruction following, and the ability to handle incredibly long contexts.
- Interconnects: OpenAI's o3: Over-optimization is back and weirder than ever
- www.ishir.com: OpenAI has released o3 and o4-mini, adding significant reasoning capabilities to its existing models. These advancements will likely transform the way users interact with AI-powered tools, making them more effective and versatile in tackling complex problems.
- www.bigdatawire.com: OpenAI released the models o3 and o4-mini that offer advanced reasoning capabilities, integrated with tool use, like web searches and code execution.
- Drew Breunig: OpenAI's o3 and o4-mini models offer enhanced reasoning capabilities in mathematical and coding tasks.
- TestingCatalog: OpenAI’s o3 and o4-mini bring smarter tools and faster reasoning to ChatGPT
- www.techradar.com: ChatGPT model matchup - I pitted OpenAI's o3, o4-mini, GPT-4o, and GPT-4.5 AI models against each other and the results surprised me
- www.techrepublic.com: OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will get access next week.
- Last Week in AI: OpenAI’s new GPT-4.1 AI models focus on coding, OpenAI launches a pair of AI reasoning models, o3 and o4-mini, Google’s newest Gemini AI model focuses on efficiency, and more!
- techcrunch.com: OpenAI’s new reasoning AI models hallucinate more.
- computational-intelligence.blogspot.com: OpenAI's new reasoning models, o3 and o4-mini, are a step up in certain capabilities compared to prior models, but their accuracy is being questioned due to increased instances of hallucinations.
- www.unite.ai: unite.ai article discussing OpenAI's o3 and o4-mini new possibilities through multimodal reasoning and integrated toolsets.
- : On April 16, 2025, OpenAI released upgraded versions of its advanced reasoning models.
- Digital Information World: OpenAI’s Latest o3 and o4-mini AI Models Disappoint Due to More Hallucinations than Older Models
- techcrunch.com: TechCrunch reports on OpenAI's GPT-4.1 models focusing on coding.
- Analytics Vidhya: o3 vs o4-mini vs Gemini 2.5 pro: The Ultimate Reasoning Battle
- THE DECODER: OpenAI's o3 achieves near-perfect performance on long context benchmark.
- the-decoder.com: OpenAI's o3 achieves near-perfect performance on long context benchmark
- www.analyticsvidhya.com: AI models keep getting smarter, but which one truly reasons under pressure? In this blog, we put o3, o4-mini, and Gemini 2.5 Pro through a series of intense challenges: physics puzzles, math problems, coding tasks, and real-world IQ tests.
- Simon Willison's Weblog: This post explores the use of OpenAI's o3 and o4-mini models for conversational AI, highlighting their ability to use tools in their reasoning process. It also discusses the concept of
- Simon Willison's Weblog: The benchmark score on OpenAI's internal PersonQA benchmark (as far as I can tell no further details of that evaluation have been shared) going from 0.16 for o1 to 0.33 for o3 is interesting, but I don't know if it it's interesting enough to produce dozens of headlines along the lines of "OpenAI's o3 and o4-mini hallucinate way higher than previous models"
- techstrong.ai: Techstrong.ai reports OpenAI o3, o4 Reasoning Models Have Some Kinks.
- www.marktechpost.com: OpenAI Releases a Practical Guide to Identifying and Scaling AI Use Cases in Enterprise Workflows
- Towards AI: OpenAI's o3 and o4-mini models have demonstrated promising improvements in reasoning tasks, particularly their use of tools in complex thought processes and enhanced reasoning capabilities.
- Analytics Vidhya: In this article, we explore how OpenAI's o3 reasoning model stands out in tasks demanding analytical thinking and multi-step problem solving, showcasing its capability in accessing and processing information through tools.
- pub.towardsai.net: TAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia…
- composio.dev: OpenAI o3 vs. Gemini 2.5 Pro vs. o4-mini
- : OpenAI o3 and o4-mini are out. They are two reasoning state-of-the-art models. They’re expensive, multimodal, and super efficient at tool use.
|
|