News from the AI & ML world

DeeperML - #appleresearch

Apple Research Questioning LLMs Ability in Reasoning - Apple research challenges the reasoning capabilities of Large Reasoning Models (LRMs), suggesting they struggle with basic reasoning tasks, sparking debate within the AI community where Google is on the other side of the debate.

References: TheSequence , chatgptiseatingtheworld.com , arstechnica.com ...

Apple researchers are challenging the perceived reasoning capabilities of Large Reasoning Models (LRMs), sparking debate within the AI community. A recent paper from Apple, titled "The Illusion of Thinking," suggests that these models, which generate intermediate thinking steps like Chain-of-Thought reasoning, struggle with fundamental reasoning tasks. The research indicates that current evaluation methods relying on math and code benchmarks are insufficient, as they often suffer from data contamination and fail to assess the structure or quality of the reasoning process.

To address these shortcomings, Apple researchers introduced controllable puzzle environments, including the Tower of Hanoi, River Crossing, Checker Jumping, and Blocks World, allowing for precise manipulation of problem complexity. These puzzles require diverse reasoning abilities, such as constraint satisfaction and sequential planning, and are free from data contamination. The Apple paper concluded that state-of-the-art LRMs ultimately fail to develop generalizable problem-solving capabilities, with accuracy collapsing to zero beyond certain complexities across different environments.

However, the Apple research has faced criticism. Experts, like Professor Seok Joon Kwon, argue that Apple's lack of high-performance hardware, such as a large GPU-based cluster comparable to those operated by Google or Microsoft, could be a factor in their findings. Some argue that the models perform better on familiar puzzles, suggesting that their success may be linked to training exposure rather than genuine problem-solving skills. Others, such as Alex Lawsen and "C. Opus," argue that the Apple researchers' results don't support claims about fundamental reasoning limitations, but rather highlight engineering challenges related to token limits and evaluation methods.

Recommended read:

Top link: www.marktechpost.com
Permalink: More details

References :

TheSequence: The Sequence Research #663: The Illusion of Thinking, Inside the Most Controversial AI Paper of Recent Weeks
chatgptiseatingtheworld.com: Research: Did Apple researchers overstate â€œThe Illusion of Thinkingâ€ in reasoning models. Opus, Lawsen think so.
www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
arstechnica.com: New Apple study challenges whether AI models truly â€œreasonâ€ through problems
9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

@felloai.com //

Apple Study Exposes Accuracy Collapse in Advanced AI Models - Apple researchers found that Large Reasoning Models (LRMs), including models from OpenAI, DeepSeek, and Google, experience a complete accuracy collapse when faced with highly complex problems, suggesting limitations in achieving human-like AI.

References: felloai.com , The Register - Software , www.theguardian.com ...

A new study by Apple researchers casts a shadow on the capabilities of cutting-edge artificial intelligence models, suggesting that their reasoning abilities may be fundamentally limited. The study, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," reveals that large reasoning models (LRMs) experience a 'complete accuracy collapse' when faced with complex problems. This challenges the widespread optimism surrounding the industry's race towards achieving artificial general intelligence (AGI), the theoretical point at which AI can match human cognitive capabilities. The findings raise questions about the reliability and practicality of relying on AI systems for critical decision-making processes.

Apple's study involved testing LRMs, including models from OpenAI, DeepSeek, and Google, using controlled puzzle environments to assess their problem-solving skills. These puzzles, such as Tower of Hanoi and River Crossing, were designed to evaluate planning, problem-solving, and compositional reasoning. The study found that while these models show improved performance on reasoning benchmarks for low-complexity tasks, their reasoning skills fall apart when tasks exceed a critical threshold. Researchers observed that as LRMs approached performance collapse, they began reducing their reasoning effort, a finding that Apple researchers found "particularly concerning."

The implications of this research are significant for the future of AI development and integration. Gary Marcus, a prominent voice of caution on AI capabilities, described the Apple paper as "pretty devastating" and stated that it raises serious questions about the path towards AGI. This research also arrives amid increasing scrutiny surrounding Apple's AI development, with some alleging the company is lagging behind competitors. Nevertheless, Apple is betting on developers to address these shortcomings, opening up its local AI engine to third-party app developers via the Foundation Models framework to encourage the building of AI applications and address limitations.

Recommended read:

Top link: felloai.com
Permalink: More details

References :

felloai.com: Appleâ€™s Latest Research Exposed Shocking Flaw in Todayâ€™s Smartest AI Models
The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
www.livescience.com: AI reasoning models arenâ€™t as smart as they were cracked up to be, Apple study claims
www.theguardian.com: Advanced AI suffers â€˜complete accuracy collapseâ€™ in face of complex problems, study finds
www.computerworld.com: Apple warns: GenAI still isnâ€™t very smart
futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
Marcus on AI: Seven replies to the viral Apple reasoning paper â€“ and why they fall short
AI News | VentureBeat: Do reasoning models really â€œthinkâ€ or not? Apple research sparks lively debate, response
www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
9to5Mac: New paper pushes back on Appleâ€™s LLM â€˜reasoning collapseâ€™ study

@machinelearning.apple.com //

Apple Study Exposes Scaling Limits in AI Reasoning - Apple research reveals Large Reasoning Models face accuracy collapse with complex problems, questioning the feasibility of achieving AGI in the near term and suggesting their reasoning may be sophisticated pattern matching.

References: THE DECODER , machinelearning.apple.com , the-decoder.com ...

Apple researchers have released a new study questioning the capabilities of Large Reasoning Models (LRMs), casting doubt on the industry's pursuit of Artificial General Intelligence (AGI). The research paper, titled "The Illusion of Thinking," reveals that these models, including those from OpenAI, Google DeepMind, Anthropic, and DeepSeek, experience a 'complete accuracy collapse' when faced with complex problems. Unlike existing evaluations primarily focused on mathematical and coding benchmarks, this study evaluates the reasoning traces of these models, offering insights into how LRMs "think".

Researchers tested various models, including OpenAI's o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet, using puzzles like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. These environments allowed for the manipulation of complexity while maintaining consistent logical structures. The team discovered that standard language models surprisingly outperformed LRMs in low-complexity scenarios, while LRMs only demonstrated advantages in medium-complexity tasks. However, all models experienced a performance collapse when faced with highly complex tasks.

The study suggests that the so-called reasoning of LRMs may be more akin to sophisticated pattern matching, which is fragile and prone to failure when challenged with significant complexity. Apple's research team identified three distinct performance regimes: low-complexity tasks where standard models outperform LRMs, medium-complexity tasks where LRMs show advantages, and high-complexity tasks where all models collapse. Apple has begun integrating powerful generative AI into its own apps and experiences. The new Foundation Models framework gives app developers access to the on-device foundation language model.

Recommended read:

Top link: machinelearning.apple.com
Permalink: More details

References :

THE DECODER: LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes.
machinelearning.apple.com: Apple machine learning discusses Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
PPC Land: PPC Land reports on Apple study exposes fundamental limits in AI reasoning models through puzzle tests.
the-decoder.com: The Decoder covers Apple's study, highlighting the limitation in thinking abilities of reasoning models.
felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal “thought processes” might be nothing more than performative illusions.
Gadgets 360: Apple Claims AI Reasoning Models Suffer From â€˜Accuracy Collapseâ€™ When Solving Complex Problems
futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
www.theguardian.com: Advanced AI suffers â€˜complete accuracy collapseâ€™ in face of complex problems, study finds
chatgptiseatingtheworld.com: Apple researchers cast doubt on AI reasoning models of other companies
www.livescience.com: AI reasoning models aren’t as smart as they were cracked up to be, Apple study claims
www.computerworld.com: Apple warns: GenAI still isnâ€™t very smart
Fello AI: Apple's research paper, "The Illusion of Thinking," argues that large reasoning models face a complete accuracy collapse beyond certain complexities, highlighting limitations in their reasoning capabilities.
WIRED: Apple's research paper challenges the claims of significant reasoning capabilities in current AI models, particularly those relying on pattern matching instead of genuine understanding.
Analytics Vidhya: Apple Exposes Reasoning Flaws in o3, Claude, and DeepSeek-R1
www.itpro.com: ‘A complete accuracy collapse’: Apple throws cold water on the potential of AI reasoning – and it's a huge blow for the likes of OpenAI, Google, and Anthropic
www.tomshardware.com: Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
Digital Information World: Apple study questions AI reasoning models in stark new report
www.theguardian.com: A research paper by Apple has taken the AI world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably.
AI Alignment Forum: Researchers at Apple released a paper provocatively titled â€œThe Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexityâ€, which â€œchallenge[s] prevailing assumptions about [language model] capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoningâ€.
Ars OpenForum: New Apple study challenges whether AI models truly â€œreasonâ€ through problems
9to5Mac: New paper pushes back on Appleâ€™s LLM â€˜reasoning collapseâ€™ study
AI News | VentureBeat: Do reasoning models really â€œthinkâ€ or not? Apple research sparks lively debate, response
www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
MarkTechPost: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
9to5mac.com: New paper pushes back on Appleâ€™s LLM â€˜reasoning collapseâ€™ study

@felloai.com //

Apple Research Reveals "Complete Accuracy Collapse" in Advanced AI Models - Apple researchers found that Large Reasoning Models (LRMs), including models from OpenAI, DeepSeek, and Google, experience a complete accuracy collapse when faced with highly complex problems, suggesting limitations in achieving human-like AI.

References: www.theguardian.com , felloai.com , www.livescience.com ...

Top link: felloai.com
Permalink: More details

References :

www.theguardian.com: Apple researchers have found â€œfundamental limitationsâ€� in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industryâ€™s race to reach a stage of AI at which it matches human intelligence.
felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal â€œthought processesâ€ might be nothing more than performative illusions.
www.computerworld.com: Filling the void in the few hours before WWDC begins, Appleâ€™s machine learning team raced out of the gate with a research paper, arguing that while the intelligence is artificial, itâ€™s only superficially smart.
www.livescience.com: A new study by Apple has ignited controversy in the AI field by showing how reasoning models undergo 'complete accuracy collapse' when overloaded with complex problems.

News from the AI & ML world

DeeperML - #appleresearch

Apple Research Questioning LLMs Ability in Reasoning - Apple research challenges the reasoning capabilities of Large Reasoning Models (LRMs), suggesting they struggle with basic reasoning tasks, sparking debate within the AI community where Google is on the other side of the debate.

Apple Study Exposes Scaling Limits in AI Reasoning - Apple research reveals Large Reasoning Models face accuracy collapse with complex problems, questioning the feasibility of achieving AGI in the near term and suggesting their reasoning may be sophisticated pattern matching.

Benchmarks

Blogs

Research Tools