News from the AI & ML world

DeeperML - #airesearch

@www.microsoft.com //
References: syncedreview.com , Source
Advancements in agentic AI are rapidly transforming various sectors, with organizations like Microsoft and Resemble AI leading the charge. Microsoft is demonstrating at TM Forum DTW Ignite 2025 how the synergy between Open Digital Architecture (ODA) and agentic AI is converting industry ambitions into measurable business outcomes within the telecommunications sector. They are focusing on breaking down operational silos, unlocking data's value, increasing efficiency, and accelerating innovation. Meanwhile, Resemble AI is advancing AI voice agents, anticipating the growing momentum of voice-first technologies, with over 74% of enterprises actively piloting or deploying these agents as part of their digital transformation strategies by 2025, according to an IDC report.

Researchers from Penn State University and Duke University have introduced "Multi-Agent Systems Automated Failure Attribution," a significant development in managing complex AI systems. This innovation addresses the challenge of identifying the root cause of failures in multi-agent systems, which can be difficult to diagnose due to the autonomous nature of agent collaboration and long information chains. The researchers have developed a benchmark dataset and several automated attribution methods to enhance the reliability of LLM Multi-Agent systems, transforming failure identification from a perplexing mystery into a quantifiable problem.

Microsoft's contributions to TM Forum initiatives, including co-authoring Open APIs and donating hardened code, highlight the importance of standards-based foundations in AI development. By aligning Microsoft Azure's cloud-native foundations with ODA's composable blueprint, Microsoft is helping operators assemble solutions without proprietary silos, leading to faster interoperability, reduced integration costs, and quicker time-to-value for new digital services. This approach addresses fragmented observability by prescribing a common logging contract and integrating with Azure Monitor, reducing the time to detect anomalies and enabling teams to focus on proactive optimization.

Recommended read:
References :
  • syncedreview.com: "Automated failure attribution" is a crucial component in the development lifecycle of Multi-Agent systems. It has the potential to transform the challenge of identifying "what went wrong and who is to blame" from a perplexing mystery into a quantifiable and analyzable problem
  • Source: At TM Forum DTW Ignite 2025, Microsoft is demonstrating how the complementary relationship between ODA and agentic AI converts ambitions into measurable business outcomes.

@www.marktechpost.com //
Apple researchers are challenging the perceived reasoning capabilities of Large Reasoning Models (LRMs), sparking debate within the AI community. A recent paper from Apple, titled "The Illusion of Thinking," suggests that these models, which generate intermediate thinking steps like Chain-of-Thought reasoning, struggle with fundamental reasoning tasks. The research indicates that current evaluation methods relying on math and code benchmarks are insufficient, as they often suffer from data contamination and fail to assess the structure or quality of the reasoning process.

To address these shortcomings, Apple researchers introduced controllable puzzle environments, including the Tower of Hanoi, River Crossing, Checker Jumping, and Blocks World, allowing for precise manipulation of problem complexity. These puzzles require diverse reasoning abilities, such as constraint satisfaction and sequential planning, and are free from data contamination. The Apple paper concluded that state-of-the-art LRMs ultimately fail to develop generalizable problem-solving capabilities, with accuracy collapsing to zero beyond certain complexities across different environments.

However, the Apple research has faced criticism. Experts, like Professor Seok Joon Kwon, argue that Apple's lack of high-performance hardware, such as a large GPU-based cluster comparable to those operated by Google or Microsoft, could be a factor in their findings. Some argue that the models perform better on familiar puzzles, suggesting that their success may be linked to training exposure rather than genuine problem-solving skills. Others, such as Alex Lawsen and "C. Opus," argue that the Apple researchers' results don't support claims about fundamental reasoning limitations, but rather highlight engineering challenges related to token limits and evaluation methods.

Recommended read:
References :
  • TheSequence: The Sequence Research #663: The Illusion of Thinking, Inside the Most Controversial AI Paper of Recent Weeks
  • chatgptiseatingtheworld.com: Research: Did Apple researchers overstate “The Illusion of Thinking†in reasoning models. Opus, Lawsen think so.
  • www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • arstechnica.com: New Apple study challenges whether AI models truly “reason†through problems
  • 9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

nftjedi@chatgptiseatingtheworld.com //
Apple researchers recently published a study titled "The Illusion of Thinking," suggesting that advanced language models (LLMs) struggle with true reasoning, relying instead on pattern matching. The study presented findings based on tasks like the Tower of Hanoi puzzle, where models purportedly failed when complexity increased, leading to the conclusion that these models possess limited problem-solving abilities. However, these conclusions are now under scrutiny, with critics arguing the experiments were not fairly designed.

Alex Lawsen of Open Philanthropy has published a counter-study challenging the foundations of Apple's claims. Lawsen argues that models like Claude, Gemini, and OpenAI's latest systems weren't failing due to cognitive limits, but rather because the evaluation methods didn't account for key technical constraints. One issue raised was that models were often cut off from providing full answers because they neared their maximum token limit, a built-in cap on output text, which Apple's evaluation counted as a reasoning failure rather than a practical limitation.

Another point of contention involved the River Crossing test, where models faced unsolvable problem setups. When the models correctly identified the tasks as impossible and refused to attempt them, they were still marked wrong. Furthermore, the evaluation system strictly judged outputs against exhaustive solutions, failing to credit models for partial but correct answers, pattern recognition, or strategic shortcuts. To illustrate, Lawsen demonstrated that when models were instructed to write a program to solve the Hanoi puzzle, they delivered accurate, scalable solutions even with 15 disks, contradicting Apple's assertion of limitations.

Recommended read:
References :
  • chatgptiseatingtheworld.com: Research: Did Apple researchers overstate “The Illusion of Thinking†in reasoning models. Opus, Lawsen think so.
  • Digital Information World: Apple’s AI Critique Faces Pushback Over Flawed Testing Methods
  • NextBigFuture.com: Apple Researcher Claims Illusion of AI Thinking Versus OpenAI Solving Ten Disk Puzzle
  • Bernard Marr: Beyond The Hype: What Apple's AI Warning Means For Business Leaders

Chris McKay@Maginative //
Meta is making a significant move in the artificial intelligence race, investing $14.3 billion for a 49% stake in data-labeling startup Scale AI. This deal is more than just a financial investment; it brings Scale AI's CEO, 28-year-old Alexandr Wang, into Meta to lead a new "superintelligence" lab. The move highlights Meta's ambition to develop AI that surpasses human capabilities across multiple domains and is a calculated gamble to regain momentum in the competitive AI landscape. Meta is aiming for an AI reset and hopes that Scale's Wang is the right partner.

This acquisition reflects Meta's strategic shift towards building partnerships and leveraging external talent. Scale AI isn't a well-known name to the general public, but it's a vital component in the AI industry, providing the labeled training data that powers many AI systems, including those used by OpenAI, Microsoft, Google, and even the U.S. Department of Defense. Meta has agreed to dramatically increase its spending with Scale, but one person said Scale expects some other companies like Google and OpenAI will stop using Scale's services for fear of Meta using information about their usage to gain a competitive advantage.

The "superintelligence" lab is part of a larger reorganization of Meta's AI divisions, aimed at sharpening the company's focus after facing internal challenges and criticism over its AI product releases. Meta, under CEO Mark Zuckerberg, has been heavily investing in AI infrastructure and product development since the rise of ChatGPT, launching its own large language model family, Llama. Zuckerberg has been personally recruiting top researchers to boost its AI efforts. The new lab will focus on developing a theoretical form of AI that surpasses human cognitive capabilities, a long-term and highly speculative goal that Meta is now seriously pursuing.

Recommended read:
References :
  • siliconangle.com: Meta Platforms Inc. is reportedly forming a new lab to develop superintelligence, a term for artificial intelligence models that can outperform humans at many tasks.
  • THE DECODER: Meta might invest $10 billion in Scale AI, following the company's underwhelming Llama 4 launch earlier this year.
  • AIwire: Meta Taps Scale AI CEO to Lead New Superintelligence Lab
  • Maginative: Meta has acquired a 49% stake in data-labeling startup Scale AI for $14.3 billion, bringing CEO Alexandr Wang on board to lead a new "superintelligence" lab as the company scrambles to catch up in the AI race.
  • SiliconANGLE: Scale AI CEO Alexandr Wang departs to join Meta after securing multibillion-dollar investment
  • www.artificialintelligence-news.com: Meta buys stake in Scale AI, raising antitrust concerns
  • Charlie Fink: Meta’s Scale AI Bet, Hollywood Sues AI, XR Industry Readies AR Glasses

@www.marktechpost.com //
Meta AI has announced the release of V-JEPA 2, an open-source world model designed to enhance robots' ability to understand and interact with physical environments. V-JEPA 2 builds upon the Joint Embedding Predictive Architecture (JEPA) and leverages self-supervised learning from over one million hours of video and images. This approach allows the model to learn abstract concepts and predict future states, enabling robots to perform tasks in unfamiliar settings and improving their understanding of motion and appearance. The model can be useful in manufacturing automation, surveillance analytics, in-building logistics, robotics, and other more advanced use cases.

Meta researchers scaled JEPA pretraining by constructing a 22M-sample dataset (VideoMix22M) from public sources and expanded the encoder capacity to over 1B parameters. They also adopted a progressive resolution strategy and extended pretraining to 252K iterations, reaching 64 frames at 384x384 resolution. V-JEPA 2 avoids the inefficiencies of pixel-level prediction by focusing on predictable scene dynamics while disregarding irrelevant noise. This abstraction makes the system both more efficient and robust, requiring just 16 seconds to plan and control robots.

Meta's V-JEPA 2 represents a step toward achieving "advanced machine intelligence" by enabling robots to interact effectively in environments they have never encountered before. The model achieves state-of-the-art results on motion recognition and action prediction benchmarks and can control robots without additional training. By focusing on the essential and predictable aspects of a scene, V-JEPA 2 aims to provide AI agents with the intuitive physics needed for effective planning and reasoning in the real world, distinguishing itself from generative models that attempt to predict every detail.

Recommended read:
References :
  • www.computerworld.com: Meta’s recent unveiling of V-JEPA 2 marks a quiet but significant shift in the evolution of AI vision systems, and it’s one enterprise leaders can’t afford to overlook,
  • www.marktechpost.com: Meta AI Releases V-JEPA 2: Open-Source Self-Supervised World Models for Understanding, Prediction, and Planning
  • MarkTechPost: Meta AI Releases V-JEPA 2: Open-Source Self-Supervised World Models for Understanding, Prediction, and Planning
  • The Tech Portal: Social media company Meta has now introduced V-JEPA 2, a new open-source…
  • about.fb.com: Our New Model Helps AI Think Before it Acts
  • AI News | VentureBeat: Meta’s new world model lets robots manipulate objects in environments they’ve never encountered before
  • www.infoq.com: Meta Introduces V-JEPA 2, a Video-Based World Model for Physical Reasoning
  • eWEEK: Dubbed as a “world model,” Meta’s New V-JEPA 2 AI model uses visual understanding and physical intuition to enhance reasoning in robotics and AI agents.

@Latest news //
References: Maginative , SiliconANGLE , AIwire ...
Meta has made a significant move in the artificial intelligence race by acquiring a 49% stake in data-labeling startup Scale AI for a staggering $14.3 billion. This investment values Scale AI at over $29 billion and brings Scale AI's founder and CEO, Alexandr Wang, on board to lead a new "superintelligence" lab within Meta. The move underscores Meta's determination to accelerate its AI development and compete more effectively with industry leaders like OpenAI and Google.

This strategic acquisition signifies a shift in Meta's approach to AI development, where Zuckerberg has been personally recruiting top researchers from other companies. Scale AI, while not widely known to the public, plays a crucial role in the AI ecosystem by providing the labeled training data that powers large language models. They have a global workforce of over 200,000 contractors to label various forms of data. By bringing Wang and a portion of his team in-house, Meta aims to gain a competitive edge in building AI models that surpass human capabilities.

Wang, who founded Scale AI in 2016 after dropping out of MIT, has grown the company into a major player in the AI industry. Scale AI works with business, governments and labs to exploit the benefits of artificial intelligence, and has a client list that includes OpenAI, Microsoft, Meta, Google, and the U.S. Department of Defense. As Wang departs for Meta, Jason Droege, former Uber Eats founder and current Chief Strategy Officer will step in as interim CEO to ensure that Scale AI continues to operate independently despite Meta's significant stake.

Recommended read:
References :
  • Maginative: Meta has acquired a 49% stake in data-labeling startup Scale AI for $14.3 billion, bringing CEO Alexandr Wang on board to lead a new "superintelligence" lab as the company scrambles to catch up in the AI race.
  • SiliconANGLE: Scale AI Inc. says its founder and Chief Executive Alexandr Wang is leaving to join Meta Platforms Inc., after confirming that the social media giant has made a “significant†investment in the company.
  • techxplore.com: Scale AI announced a major new investment by Meta late Thursday that values the startup at more than $29 billion and puts its founder to work for the tech titan.
  • AIwire: Meta Taps Scale AI CEO to Lead New Superintelligence Lab
  • Verdict: Meta invests in Scale AI and appoints founder to lead AI unit

@felloai.com //
A new study by Apple researchers casts a shadow on the capabilities of cutting-edge artificial intelligence models, suggesting that their reasoning abilities may be fundamentally limited. The study, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," reveals that large reasoning models (LRMs) experience a 'complete accuracy collapse' when faced with complex problems. This challenges the widespread optimism surrounding the industry's race towards achieving artificial general intelligence (AGI), the theoretical point at which AI can match human cognitive capabilities. The findings raise questions about the reliability and practicality of relying on AI systems for critical decision-making processes.

Apple's study involved testing LRMs, including models from OpenAI, DeepSeek, and Google, using controlled puzzle environments to assess their problem-solving skills. These puzzles, such as Tower of Hanoi and River Crossing, were designed to evaluate planning, problem-solving, and compositional reasoning. The study found that while these models show improved performance on reasoning benchmarks for low-complexity tasks, their reasoning skills fall apart when tasks exceed a critical threshold. Researchers observed that as LRMs approached performance collapse, they began reducing their reasoning effort, a finding that Apple researchers found "particularly concerning."

The implications of this research are significant for the future of AI development and integration. Gary Marcus, a prominent voice of caution on AI capabilities, described the Apple paper as "pretty devastating" and stated that it raises serious questions about the path towards AGI. This research also arrives amid increasing scrutiny surrounding Apple's AI development, with some alleging the company is lagging behind competitors. Nevertheless, Apple is betting on developers to address these shortcomings, opening up its local AI engine to third-party app developers via the Foundation Models framework to encourage the building of AI applications and address limitations.

Recommended read:
References :
  • felloai.com: Apple’s Latest Research Exposed Shocking Flaw in Today’s Smartest AI Models
  • The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
  • www.livescience.com: AI reasoning models aren’t as smart as they were cracked up to be, Apple study claims
  • www.theguardian.com: Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds
  • www.computerworld.com: Apple warns: GenAI still isn’t very smart
  • futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
  • Marcus on AI: Seven replies to the viral Apple reasoning paper – and why they fall short
  • AI News | VentureBeat: Do reasoning models really “think†or not? Apple research sparks lively debate, response
  • www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • 9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

@Latest news //
References: www.eweek.com , Quartz , the-decoder.com ...
Meta CEO Mark Zuckerberg is spearheading a new initiative to develop artificial general intelligence (AGI), recruiting top AI researchers to form an elite team. This push aims to create AI systems capable of performing any intellectual task that a human can, positioning Meta to compete directly with tech giants like Google and OpenAI. Zuckerberg's involvement includes personal recruitment efforts, indicating the high priority Meta is placing on this project. This signals a significant shift for Meta, aiming to lead in the rapidly evolving AI landscape.

Disappointment with the performance of Meta's LLaMA 4 model compared to competitors like OpenAI's GPT-4 and Google's Gemini spurred Zuckerberg's increased focus on AGI. Internally, LLaMA 4 was considered inadequate in real-world user experience, lacking coherence and usability. Furthermore, Meta's metaverse investments have not yielded the anticipated results, leading the company to redirect its focus and resources toward AI, aiming to recapture relevance and mindshare in the tech industry. With tens of billions already invested in infrastructure and foundational models, Meta is now fully committed to achieving AGI.

To further bolster its AI ambitions, Meta is investing heavily in AI start-up Scale AI. Meta has invested €12 billion and acquired a 49% stake in Scale AI. The investment has caused Google to end its $200 million partnership with Scale AI. Zuckerberg has also offered large salaries to poach AI talent. This move is part of Meta's broader strategy to build superintelligence and challenge the dominance of other AI leaders. Meta's aggressive pursuit of AI talent and strategic investments highlight its determination to become a frontrunner in the race to build AGI.

Recommended read:
References :
  • www.eweek.com: Meta’s Bold New Lab Takes Aim at Superintelligent AI, As Zuckerberg Enters ‘Founder Mode’
  • Quartz: Meta is making a huge push for AI 'superintelligence'
  • techstrong.ai: Meta Platforms Inc. is creating an artificial intelligence (AI) lab to pursue an AI system that surpasses human intelligence with "superintelligence."
  • the-decoder.com: Meta CEO Mark Zuckerberg is personally assembling a new team of experts to close the gap in artificial intelligence development.
  • www.unite.ai: Meta's reported $10 billion investment in Scale AI represents far more than a simple funding round—it signals a fundamental strategic evolution in how tech giants view the AI arms race.
  • Analytics India Magazine: Why Meta is Investing in Scale AI
  • eWEEK: Meta’s Bold New Lab Takes Aim at Superintelligent AI, As Zuckerberg Enters ‘Founder Mode’
  • SiliconANGLE: Meta reportedly forming superintelligence lab amid Llama 4 Behemoth delays
  • bsky.app: Let's talk about Meta's investment in Scale AI and whether *this* reorg will be different than the others
  • XR Today: A New Lifeline for Reality Labs? Meta Partners with Anduril on Military XR
  • siliconangle.com: Meta reportedly forming superintelligence lab amid Llama 4 Behemoth delays
  • www.theguardian.com: Meta to announce $15bn investment in bid to achieve computerised ‘superintelligence’
  • techstrong.ai: Meta is creating an artificial intelligence (AI) lab to pursue an AI system that surpasses human intelligence with "superintelligence."
  • Latest news: The company is also apparently in talks to invest more than $10 billion in Scale AI.
  • AIwire: Meta Taps Scale AI CEO to Lead New Superintelligence Lab
  • The Rundown AI: PLUS: Meta launching ‘superintelligence’ lab with Scale AI founder
  • siliconangle.com: Scale AI CEO Alexandr Wang departs to join Meta after securing multibillion-dollar investment
  • Maginative: Meta has acquired a 49% stake in data-labeling startup Scale AI for $14.3 billion, bringing CEO Alexandr Wang on board to lead a new "superintelligence" lab as the company scrambles to catch up in the AI race.
  • SiliconANGLE: Scale AI Inc. says its founder and Chief Executive Alexandr Wang is leaving to join Meta Platforms Inc., after confirming that the social media giant has made a “significant†investment in the company.
  • techxplore.com: Meta makes major investment in Scale AI, takes in CEO
  • TechInformed: Meta scales up AI bid with $14bn investment and Trump sticks with Starlink despite Musk feud
  • siliconangle.com: Meta files lawsuit against AI firm behind fake nonconsensual nude images
  • www.laptopmag.com: A simple mistake in the Meta AI app could expose your deepest secrets
  • felloai.com: News article about Meta recruiting AI researchers.
  • siliconangle.com: Users of new Meta AI app unknowingly make chatbot logs public
  • Tech News | Euronews RSS: Meta bets big on start-up Scale AI with €12 billion investment and hires its co-founder
  • www.rdworldonline.com: Google abandons $200M Scale AI partnership after Meta’s $14.3B stake; Zuckerberg offers $10M+ to poach top AI talent
  • Charlie Fink: Meta’s Scale AI Bet, Hollywood Sues AI, XR Industry Readies AR Glasses

@machinelearning.apple.com //
Apple researchers have released a new study questioning the capabilities of Large Reasoning Models (LRMs), casting doubt on the industry's pursuit of Artificial General Intelligence (AGI). The research paper, titled "The Illusion of Thinking," reveals that these models, including those from OpenAI, Google DeepMind, Anthropic, and DeepSeek, experience a 'complete accuracy collapse' when faced with complex problems. Unlike existing evaluations primarily focused on mathematical and coding benchmarks, this study evaluates the reasoning traces of these models, offering insights into how LRMs "think".

Researchers tested various models, including OpenAI's o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet, using puzzles like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. These environments allowed for the manipulation of complexity while maintaining consistent logical structures. The team discovered that standard language models surprisingly outperformed LRMs in low-complexity scenarios, while LRMs only demonstrated advantages in medium-complexity tasks. However, all models experienced a performance collapse when faced with highly complex tasks.

The study suggests that the so-called reasoning of LRMs may be more akin to sophisticated pattern matching, which is fragile and prone to failure when challenged with significant complexity. Apple's research team identified three distinct performance regimes: low-complexity tasks where standard models outperform LRMs, medium-complexity tasks where LRMs show advantages, and high-complexity tasks where all models collapse. Apple has begun integrating powerful generative AI into its own apps and experiences. The new Foundation Models framework gives app developers access to the on-device foundation language model.

Recommended read:
References :
  • THE DECODER: LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes.
  • machinelearning.apple.com: Apple machine learning discusses Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
  • PPC Land: PPC Land reports on Apple study exposes fundamental limits in AI reasoning models through puzzle tests.
  • the-decoder.com: The Decoder covers Apple's study, highlighting the limitation in thinking abilities of reasoning models.
  • felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal “thought processes” might be nothing more than performative illusions.
  • Gadgets 360: Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems
  • futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
  • The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
  • www.theguardian.com: Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds
  • chatgptiseatingtheworld.com: Apple researchers cast doubt on AI reasoning models of other companies
  • www.livescience.com: AI reasoning models aren’t as smart as they were cracked up to be, Apple study claims
  • www.computerworld.com: Apple warns: GenAI still isn’t very smart
  • Fello AI: Apple's research paper, "The Illusion of Thinking," argues that large reasoning models face a complete accuracy collapse beyond certain complexities, highlighting limitations in their reasoning capabilities.
  • WIRED: Apple's research paper challenges the claims of significant reasoning capabilities in current AI models, particularly those relying on pattern matching instead of genuine understanding.
  • Analytics Vidhya: Apple Exposes Reasoning Flaws in o3, Claude, and DeepSeek-R1
  • www.itpro.com: ‘A complete accuracy collapse’: Apple throws cold water on the potential of AI reasoning – and it's a huge blow for the likes of OpenAI, Google, and Anthropic
  • www.tomshardware.com: Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
  • Digital Information World: Apple study questions AI reasoning models in stark new report
  • www.theguardian.com: A research paper by Apple has taken the AI world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably.
  • AI Alignment Forum: Researchers at Apple released a paper provocatively titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexityâ€, which “challenge[s] prevailing assumptions about [language model] capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoningâ€.
  • Ars OpenForum: New Apple study challenges whether AI models truly “reason†through problems
  • 9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study
  • AI News | VentureBeat: Do reasoning models really “think†or not? Apple research sparks lively debate, response
  • www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • MarkTechPost: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • 9to5mac.com: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

@felloai.com //
A new study by Apple researchers casts a shadow on the capabilities of cutting-edge artificial intelligence models, suggesting that their reasoning abilities may be fundamentally limited. The study, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," reveals that large reasoning models (LRMs) experience a 'complete accuracy collapse' when faced with complex problems. This challenges the widespread optimism surrounding the industry's race towards achieving artificial general intelligence (AGI), the theoretical point at which AI can match human cognitive capabilities. The findings raise questions about the reliability and practicality of relying on AI systems for critical decision-making processes.

Apple's study involved testing LRMs, including models from OpenAI, DeepSeek, and Google, using controlled puzzle environments to assess their problem-solving skills. These puzzles, such as Tower of Hanoi and River Crossing, were designed to evaluate planning, problem-solving, and compositional reasoning. The study found that while these models show improved performance on reasoning benchmarks for low-complexity tasks, their reasoning skills fall apart when tasks exceed a critical threshold. Researchers observed that as LRMs approached performance collapse, they began reducing their reasoning effort, a finding that Apple researchers found "particularly concerning."

The implications of this research are significant for the future of AI development and integration. Gary Marcus, a prominent voice of caution on AI capabilities, described the Apple paper as "pretty devastating" and stated that it raises serious questions about the path towards AGI. This research also arrives amid increasing scrutiny surrounding Apple's AI development, with some alleging the company is lagging behind competitors. Nevertheless, Apple is betting on developers to address these shortcomings, opening up its local AI engine to third-party app developers via the Foundation Models framework to encourage the building of AI applications and address limitations.

Recommended read:
References :
  • www.theguardian.com: Apple researchers have found “fundamental limitationsâ€� in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industry’s race to reach a stage of AI at which it matches human intelligence.
  • felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal “thought processes†might be nothing more than performative illusions.
  • www.computerworld.com: Filling the void in the few hours before WWDC begins, Apple’s machine learning team raced out of the gate with a research paper, arguing that while the intelligence is artificial, it’s only superficially smart.
  • www.livescience.com: A new study by Apple has ignited controversy in the AI field by showing how reasoning models undergo 'complete accuracy collapse' when overloaded with complex problems.

Chris McKay@Maginative //
Google's AI research notebook, NotebookLM, has introduced a significant upgrade that enhances collaboration by allowing users to publicly share their AI-powered notebooks with a simple link. This new feature, called Public Notebooks, enables users to share their research summaries and audio overviews generated by AI with anyone, without requiring sign-in or permissions. This move aims to transform NotebookLM from a personal research tool into an interactive, AI-powered knowledge hub, facilitating easier distribution of study guides, project briefs, and more.

The public sharing feature provides viewers with the ability to interact with AI-generated content like FAQs and overviews, as well as ask questions in chat. However, they cannot edit the original sources, ensuring the preservation of ownership while enabling discovery. To share a notebook, users can click the "Share" button, switch the setting to "Anyone with the link," and copy the link. This streamlined process is similar to sharing Google Docs, making it intuitive and accessible for users.

This upgrade is particularly beneficial for educators, startups, and nonprofits. Teachers can share curated curriculum summaries, startups can distribute product manuals, and nonprofits can publish donor briefing documents without the need to build a dedicated website. By enabling easier sharing of AI-generated notes and audio overviews, Google is demonstrating how generative AI can be integrated into everyday productivity workflows, making NotebookLM a more grounded tool for sense-making of complex material.

Recommended read:
References :
  • Maginative: Google’s NotebookLM Now Lets You Share AI-Powered Notebooks With a Link
  • The Official Google Blog: NotebookLM is adding a new way to share your own notebooks publicly.
  • PCMag Middle East ai: Google Makes It Easier to Share Your NotebookLM Docs, AI Podcasts
  • AI & Machine Learning: How Alpian is redefining private banking for the digital age with gen AI
  • venturebeat.com: Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud
  • TestingCatalog: Google’s Kingfall model briefly goes live on AI Studio before lockdown
  • shellypalmer.com: NotebookLM, one of Google's most viral AI products, just got a really useful upgrade: users can now publicly share notebooks with a link.

Alexey Shabanov@TestingCatalog //
References: Data Phoenix , Maginative , TestingCatalog ...
Perplexity AI is rapidly expanding its presence in the AI market through strategic integrations and innovative features. The company has launched Perplexity Labs, a new tool for Pro subscribers designed to automate tasks such as creating reports, spreadsheets, and mini web apps. This feature leverages AI research, code execution, and content generation, positioning Perplexity as a versatile platform for both information retrieval and content creation. Labs can generate and execute code for data structuring, create interactive web apps, and produce various file types, making it well-suited for diverse projects from marketing campaigns to business analysis.

The startup is also making strides in device integration. Samsung is reportedly nearing a wide-ranging deal with Perplexity that includes investment and deep integration into devices, the Bixby assistant, and the web browser. This partnership could see Perplexity pre-installed on upcoming Galaxy S26 series phones, potentially replacing Google Gemini as the default AI assistant. The integration might also extend to Samsung Internet, offering users more advanced and personalized AI experiences directly within their web browsing.

Furthermore, Perplexity is enhancing its AI-driven search capabilities within the Comet Browser. Users can now observe Perplexity AI controlling pages in the Comet Browser, with visual indicators showing actions like clicking and filling forms. This new feature allows for more interactive and transparent AI-driven automation, benefiting users who automate repetitive workflows such as data entry and testing. This positions Perplexity as a pioneer in bringing interactive and transparent AI-driven automation to the browser.

Recommended read:
References :
  • Data Phoenix: Perplexity launches Labs, an AI tool that helps users create reports, dashboards, and web apps
  • Maginative: Perplexity's new Labs feature for Pro subscribers automates time-consuming tasks like creating reports, spreadsheets, and mini web apps using AI research and code execution.
  • www.techradar.com: The Samsung Galaxy S26 series could have Perplexity AI baked in
  • TestingCatalog: Users can now watch Perplexity AI control pages in Comet Browser
  • Mark Gurman: NEW: Samsung is nearing wide-ranging deal with Perplexity on an investment and deep integration into devices, Bixby assistant and web browser, I’m told.
  • Dataconomy: Samsung may invest in Perplexity and integrate it into Galaxy phones
  • PCMag Middle East ai: Samsung's Galaxy S26 May Drop Google Gemini as Its Default AI Chatbot
  • Latest news: If Perplexity's app and assistant get preloaded on upcoming Galaxies, what happens to Google Gemini integration?
  • www.lifewire.com: Samsung + Perplexity Might Be the AI Power Couple That Could Redefine Your Phone

@www.microsoft.com //
Microsoft is making significant strides in integrating Artificial Intelligence into real-world applications, with a strong emphasis on its impact on healthcare and enterprise solutions. Microsoft Research President Peter Lee is revisiting his earlier optimistic predictions about AI's transformative potential in healthcare, acknowledging both the successes and unforeseen challenges. This reassessment is being done through "The AI Revolution in Medicine, Revisited" podcast series, featuring discussions with thought leaders like Ethan Mollick and Azeem Azhar, who are exploring the multifaceted ways AI is reshaping healthcare and organizational systems. Their analysis covers areas like medical scribing, clinician support, and consumer health monitoring.

Microsoft CVP Charles Lamanna is championing the concept of the "Agent-Native Enterprise," highlighting how AI agents and open standards are poised to revolutionize business applications. In a recent discussion, Lamanna outlined strategies for scaling AI agents within organizations, rethinking organizational structures, and building in what he terms the "post-biz app era." He emphasized the importance of customer obsession and extreme ownership, principles he brought back to Microsoft from his own entrepreneurial experience. Lamanna believes that AI will enable the shift towards generalist teams, allowing enterprises to focus on high-impact AI projects.

In other news, ZeniMax QA workers have reached a tentative union contract with Microsoft after two years of negotiations. The new contract, covering over 300 employees, includes substantial wage increases, new minimum salaries, and protections for workers against the impacts of AI. The agreement also introduces a crediting policy recognizing QA workers' contributions to video games. Jessee Leese, a QA tester and ZeniMax Workers United-CWA bargaining committee member, hailed the contract as a potential standard for fair treatment in the video game industry, encouraging other professionals to take action. The tentative contract is awaiting ratification by union members.

Recommended read:
References :
  • www.engadget.com: ZeniMax QA workers win tentative union contract with Microsoft
  • www.madrona.com: The End of Biz Apps? AI, Agility, and The Agent-Native Enterprise from Microsoft CVP Charles Lamanna
  • Microsoft Research: What AI’s impact on individuals means for the health workforce and industry

Alexey Shabanov@TestingCatalog //
References: TechCrunch , TestingCatalog , Data Phoenix ...
Perplexity has unveiled Perplexity Labs, a new AI-powered tool designed for Pro subscribers, aiming to revolutionize the creation of work deliverables. Labs automates tasks like generating reports, spreadsheets, dashboards, and even mini web apps, leveraging AI research and code execution to bring projects from ideation to completion. It functions as an AI-driven team, providing users with a comprehensive suite of tools to transform their ideas into tangible results, marking a significant move beyond traditional search functionalities.

Labs stands out by investing a minimum of 10 minutes in self-supervised work, conducting web browsing, writing and executing code, and organizing data to achieve its objectives. This extended timeframe allows the AI to crunch numbers, apply formulas, generate visuals, and construct interactive web apps, all without requiring the user to lift a finger. The technology combines various AI capabilities Perplexity has developed, packaging the output into an "Assets" tab for easy access and download.

Available on web, iOS, and Android, with Mac and Windows apps on the horizon, Perplexity Labs is accessible for Pro subscribers at $20 per month. With the mini web apps feature being particularly ambitious, Labs can build and deploy simple interactive websites directly within the interface, such as dashboards, slideshows, or data visualization tools, without the user needing any coding knowledge. This move aims to shift Perplexity's positioning from a "better Google" to a "personal research assistant and worker," providing training wheels for building AI agents and automating time-consuming tasks.

Recommended read:
References :
  • TechCrunch: Perplexity’s new tool can generate spreadsheets, dashboards, and more
  • TestingCatalog: Perplexity AI rolled out Perplexity Labs for Pro subscribers
  • Latest news: 5 projects Perplexity's new Labs AI tool can whip up for you now - in minutes
  • Data Phoenix: Article discussing Perplexity new labs feature.
  • Maginative: Reports on Perplexity Launches New Labs Feature
  • www.itpro.com: Sick and tired of spreadsheets? Perplexity’s new tools can help with that
  • www.analyticsvidhya.com: I Tried Perplexity Labs and Here’s What I Found
  • Analytics Vidhya: I Tried Perplexity Labs and Here’s What I Found

Dashveenjit Kaur@TechHQ //
Dell Technologies has secured a contract with the U.S. Department of Energy to construct the next-generation NERSC-10 supercomputer, a project powered by NVIDIA's Vera Rubin architecture. This new system, dubbed "Doudna" after Nobel laureate Jennifer Doudna, a pioneer in CRISPR gene-editing technology, is poised to be a major federal investment in scientific computing infrastructure. Energy Secretary Chris Wright announced the contract during a visit to Lawrence Berkeley National Laboratory, emphasizing that the deployment in 2026 is crucial for maintaining American technological leadership amidst increasing global competition in AI and quantum computing.

The "Doudna" supercomputer, also known as NERSC-10, aims to significantly accelerate scientific research across multiple domains, including fusion energy, astronomy, and life sciences. Designed to serve 11,000 researchers, it represents an integration of artificial intelligence, quantum workflows, and real-time data streaming from experimental facilities. Unlike traditional supercomputers, Doudna’s architecture emphasizes coherent memory access between CPUs and GPUs, facilitating efficient data sharing between heterogeneous processors which is essential for modern AI-accelerated scientific workflows.

The Doudna system is expected to deliver a 10x increase in scientific output compared to its predecessor, Perlmutter, while only consuming 2-3x the power, translating to a 3-5x improvement in performance per watt. Nick Wright, advanced technologies group lead and Doudna chief architect at NERSC, stated, "We’re not just building a faster computer, we’re building a system that helps researchers think bigger and discover sooner." NVIDIA's Vera Rubin platform introduces hardware-level optimizations specifically designed for the convergence of simulation, machine learning, and quantum algorithm development, marking a significant advancement in cutting-edge research capabilities.

Recommended read:
References :
  • blogs.nvidia.com: Ready for a front-row seat to the next scientific revolution? That’s the idea behind Doudna — a groundbreaking supercomputer announced today at Lawrence Berkeley National Laboratory in Berkeley, California.
  • insidehpc.com: The new system, due in 2026, is named after Jennifer Doudna, the Berkeley Lab-based biochemist who won the 2020 Nobel Prize for Chemistry for her work on gene-editing technology.
  • TechHQ: Nvidia Vera Rubin supercomputer to serve researchers in fusion energy, astronomy, and life sciences.
  • techxplore.com: A new supercomputer named after a winner of the Nobel Prize in chemistry will help power artificial intelligence technology and scientific discoveries from a perch in the hills above the University of California, Berkeley, federal officials said Thursday.
  • insidehpc.com: DOE Announces “Doudna†Dell-NVIDIA Supercomputer at NERSC
  • techhq.com: Nvidia Vera Rubin supercomputer to serve researchers in fusion energy, astronomy, and life sciences. Dell’s system targets 10x performance, 3-5x better power efficiency, to be deployed in 2026.

@www.quantamagazine.org //
Researchers are making strides in AI reasoning and efficiency, tackling both complex problem-solving and the energy consumption of these systems. One promising area involves reversible computing, where programs can run backward as easily as forward, theoretically saving energy by avoiding data deletion. Michael Frank, a researcher interested in the physical limits of computation, discovered that reversible computing could keep computational progress going as traditional computing slows due to physical limitations. Christof Teuscher at Portland State University emphasized the potential for significant power savings with this approach.

An evolution of the LLM-as-a-Judge paradigm is emerging. Meta AI has introduced the J1 framework which shifts the paradigm of LLMs from passive generators to active, deliberative evaluators through self-evaluation. This approach, detailed in "J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning," addresses the growing need for rigorous and scalable evaluation as AI systems become more capable and widely deployed. By reframing judgment as a structured reasoning task trained through reinforcement learning, J1 aims to create models that perform consistent, interpretable, and high-fidelity evaluations.

Soheil Feizi, an associate professor at the University of Maryland, has received a $1 million federal grant to advance foundational research in reasoning AI models. This funding, stemming from a Presidential Early Career Award for Scientists and Engineers (PECASE), will support his work in defending large language models (LLMs) against attacks, identifying weaknesses in how these models learn, encouraging transparent, step-by-step logic, and understanding the "reasoning tokens" that drive decision-making. Feizi plans to explore innovative approaches like live activation probing and novel reinforcement-learning designs, aiming to transform theoretical advancements into practical applications and real-world usages.

Recommended read:
References :

Alexey Shabanov@TestingCatalog //
Google has launched the NotebookLM mobile app for Android and iOS, bringing its AI-powered research assistant to mobile devices. This release marks a significant step in expanding access to NotebookLM, which was initially launched as a web-based tool in 2023 under the codename "Project Tailwind." The mobile app aims to offer personalized learning and efficient content synthesis, allowing users to interact with and process information on the go. The app is officially available to everyone after months of waiting, offering the core features of NotebookLM, with the promise of continued functionality additions.

The NotebookLM mobile app focuses on audio-first experiences, with features like audio overviews that generate podcast-style summaries. These summaries can be played directly from the list view without opening a project, making it feel like a media player for casual content consumption. Users can also download audio overviews for offline playback and listen in the background, supporting learning during commutes or other activities. Moreover, the app supports interactive mode in audio sessions, where users can ask questions mid-playback, creating a live dialogue experience.

The mobile app retains the functionality of the web version, including the ability to create new notebooks and upload sources like PDFs, Google Docs, and YouTube videos. Users can add sources directly from their mobile devices by using the "Share" button in any app, making it easier to build and maintain research libraries. NotebookLM relies only on user-uploaded sources, ensuring reliable and verifiable information. The rollout underscores Google’s evolving strategy for NotebookLM, transitioning from a productivity assistant to a multimodal content platform, appealing to students, researchers, and content creators seeking flexible ways to absorb structured knowledge.

Recommended read:
References :
  • AI News | VentureBeat: Google finally launches NotebookLM mobile app at I/O: hands-on, first impressions
  • www.laptopmag.com: An exclusive look at Google's NotebookLM app on Android and iOS
  • TestingCatalog: Google launches NotebookLM mobile app with audio-first features on mobile
  • www.tomsguide.com: NotebookLM just arrived on Android — and it can turn your notes into podcasts
  • THE DECODER: Google launches NotebookLM mobile app for Android and iOS
  • MarkTechPost: Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration
  • the-decoder.com: Google launches NotebookLM mobile app for Android and iOS
  • www.marktechpost.com: Google AI Releases Standalone NotebookLM Mobile App with Offline Audio and Seamless Source Integration
  • www.techradar.com: Google's free NotebookLM AI app is out now for Android and iOS – here's why it's a day-one download for me