News from the AI & ML world
@machinelearning.apple.com
//
Apple researchers have released a new study questioning the capabilities of Large Reasoning Models (LRMs), casting doubt on the industry's pursuit of Artificial General Intelligence (AGI). The research paper, titled "The Illusion of Thinking," reveals that these models, including those from OpenAI, Google DeepMind, Anthropic, and DeepSeek, experience a 'complete accuracy collapse' when faced with complex problems. Unlike existing evaluations primarily focused on mathematical and coding benchmarks, this study evaluates the reasoning traces of these models, offering insights into how LRMs "think".
Researchers tested various models, including OpenAI's o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet, using puzzles like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. These environments allowed for the manipulation of complexity while maintaining consistent logical structures. The team discovered that standard language models surprisingly outperformed LRMs in low-complexity scenarios, while LRMs only demonstrated advantages in medium-complexity tasks. However, all models experienced a performance collapse when faced with highly complex tasks.
The study suggests that the so-called reasoning of LRMs may be more akin to sophisticated pattern matching, which is fragile and prone to failure when challenged with significant complexity. Apple's research team identified three distinct performance regimes: low-complexity tasks where standard models outperform LRMs, medium-complexity tasks where LRMs show advantages, and high-complexity tasks where all models collapse. Apple has begun integrating powerful generative AI into its own apps and experiences. The new Foundation Models framework gives app developers access to the on-device foundation language model.
ImgSrc: mlr.cdn-apple.c
References :
- THE DECODER: LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes.
- machinelearning.apple.com: Apple machine learning discusses Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- PPC Land: PPC Land reports on Apple study exposes fundamental limits in AI reasoning models through puzzle tests.
- the-decoder.com: The Decoder covers Apple's study, highlighting the limitation in thinking abilities of reasoning models.
- felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal “thought processes” might be nothing more than performative illusions.
- Gadgets 360: Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems
- futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
- The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
- www.theguardian.com: Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds
- chatgptiseatingtheworld.com: Apple researchers cast doubt on AI reasoning models of other companies
- www.livescience.com: AI reasoning models aren’t as smart as they were cracked up to be, Apple study claims
- www.computerworld.com: Apple warns: GenAI still isn’t very smart
- Fello AI: Apple's research paper, "The Illusion of Thinking," argues that large reasoning models face a complete accuracy collapse beyond certain complexities, highlighting limitations in their reasoning capabilities.
- WIRED: Apple's research paper challenges the claims of significant reasoning capabilities in current AI models, particularly those relying on pattern matching instead of genuine understanding.
- Analytics Vidhya: Apple Exposes Reasoning Flaws in o3, Claude, and DeepSeek-R1
- www.itpro.com: ‘A complete accuracy collapse’: Apple throws cold water on the potential of AI reasoning – and it's a huge blow for the likes of OpenAI, Google, and Anthropic
- www.tomshardware.com: Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
- Digital Information World: Apple study questions AI reasoning models in stark new report
- www.theguardian.com: A research paper by Apple has taken the AI world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably.
- AI Alignment Forum: Researchers at Apple released a paper provocatively titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexityâ€, which “challenge[s] prevailing assumptions about [language model] capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoningâ€.
- Ars OpenForum: New Apple study challenges whether AI models truly “reason†through problems
- 9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study
- AI News | VentureBeat: Do reasoning models really “think†or not? Apple research sparks lively debate, response
- www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
- MarkTechPost: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
- 9to5mac.com: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study
Classification: