News from the AI & ML world

DeeperML - #researchers

@colab.research.google.com //
References: Magenta , THE DECODER , github.com ...
Google's Magenta project has unveiled Magenta RealTime (Magenta RT), an open-weights live music model designed for interactive music creation, control, and performance. This innovative model builds upon Google DeepMind's research in real-time generative music, providing opportunities for unprecedented live music exploration. Magenta RT is a significant advancement in AI-driven music technology, offering capabilities for both skill-gap accessibility and enhancement of existing musical practices. As an open-weights model, Magenta RT is targeted towards eventually running locally on consumer hardware, showcasing Google's commitment to democratizing AI music creation tools.

Magenta RT, an 800 million parameter autoregressive transformer model, was trained on approximately 190,000 hours of instrumental stock music. It leverages SpectroStream for high-fidelity audio (48kHz stereo) and a newly developed MusicCoCa embedding model, inspired by MuLan and CoCa. This combination allows users to dynamically shape and morph music styles in real-time by manipulating style embeddings, effectively blending various musical styles, instruments, and attributes. The model code is available on Github and the weights are available on Google Cloud Storage and Hugging Face under permissive licenses with some additional bespoke terms.

Magenta RT operates by generating music in sequential chunks, conditioned on both previous audio output and style embeddings. This approach enables the creation of interactive soundscapes for performances and virtual spaces. Impressively, the model achieves a real-time factor of 1.6 on a Colab free-tier TPU (v2-8 TPU), generating two seconds of audio in just 1.25 seconds. This technology unlocks the potential to explore entirely new musical landscapes, experiment with never-before-heard instrument combinations, and craft unique sonic textures, ultimately fostering innovative forms of musical expression and performance.

Recommended read:
References :
  • Magenta: Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment.
  • THE DECODER: Google has released Magenta RealTime (Magenta RT), an open-source AI model for live music creation and control. The article appeared first on The Decoder.
  • the-decoder.com: Google has released Magenta RealTime (Magenta RT), an open-source AI model for live music creation and control. The article appeared first on .
  • github.com: Magenta RealTime: An Open-Weights Live Music Model
  • aistudio.google.com: Magenta RealTime: An Open-Weights Live Music Model
  • huggingface.co: Sharing a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment
  • Magenta: Magenta RealTime: An Open-Weights Live Music Model
  • Magenta: Magenta RT is the latest in a series of models and applications developed as part of the Magenta Project.
  • www.marktechpost.com: Google Researchers Release Magenta RealTime: An Open-Weight Model for Real-Time AI Music Generation
  • Simon Willison's Weblog: Fun new "live music model" release from Google DeepMind: Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment.
  • MarkTechPost: Google’s Magenta team has introduced Magenta RealTime (Magenta RT), an open-weight, real-time music generation model that brings unprecedented interactivity to generative audio.

nftjedi@chatgptiseatingtheworld.com //
Apple researchers recently published a study titled "The Illusion of Thinking," suggesting that advanced language models (LLMs) struggle with true reasoning, relying instead on pattern matching. The study presented findings based on tasks like the Tower of Hanoi puzzle, where models purportedly failed when complexity increased, leading to the conclusion that these models possess limited problem-solving abilities. However, these conclusions are now under scrutiny, with critics arguing the experiments were not fairly designed.

Alex Lawsen of Open Philanthropy has published a counter-study challenging the foundations of Apple's claims. Lawsen argues that models like Claude, Gemini, and OpenAI's latest systems weren't failing due to cognitive limits, but rather because the evaluation methods didn't account for key technical constraints. One issue raised was that models were often cut off from providing full answers because they neared their maximum token limit, a built-in cap on output text, which Apple's evaluation counted as a reasoning failure rather than a practical limitation.

Another point of contention involved the River Crossing test, where models faced unsolvable problem setups. When the models correctly identified the tasks as impossible and refused to attempt them, they were still marked wrong. Furthermore, the evaluation system strictly judged outputs against exhaustive solutions, failing to credit models for partial but correct answers, pattern recognition, or strategic shortcuts. To illustrate, Lawsen demonstrated that when models were instructed to write a program to solve the Hanoi puzzle, they delivered accurate, scalable solutions even with 15 disks, contradicting Apple's assertion of limitations.

Recommended read:
References :
  • chatgptiseatingtheworld.com: Research: Did Apple researchers overstate “The Illusion of Thinking†in reasoning models. Opus, Lawsen think so.
  • Digital Information World: Apple’s AI Critique Faces Pushback Over Flawed Testing Methods
  • NextBigFuture.com: Apple Researcher Claims Illusion of AI Thinking Versus OpenAI Solving Ten Disk Puzzle
  • Bernard Marr: Beyond The Hype: What Apple's AI Warning Means For Business Leaders

@machinelearning.apple.com //
Apple researchers have released a new study questioning the capabilities of Large Reasoning Models (LRMs), casting doubt on the industry's pursuit of Artificial General Intelligence (AGI). The research paper, titled "The Illusion of Thinking," reveals that these models, including those from OpenAI, Google DeepMind, Anthropic, and DeepSeek, experience a 'complete accuracy collapse' when faced with complex problems. Unlike existing evaluations primarily focused on mathematical and coding benchmarks, this study evaluates the reasoning traces of these models, offering insights into how LRMs "think".

Researchers tested various models, including OpenAI's o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet, using puzzles like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. These environments allowed for the manipulation of complexity while maintaining consistent logical structures. The team discovered that standard language models surprisingly outperformed LRMs in low-complexity scenarios, while LRMs only demonstrated advantages in medium-complexity tasks. However, all models experienced a performance collapse when faced with highly complex tasks.

The study suggests that the so-called reasoning of LRMs may be more akin to sophisticated pattern matching, which is fragile and prone to failure when challenged with significant complexity. Apple's research team identified three distinct performance regimes: low-complexity tasks where standard models outperform LRMs, medium-complexity tasks where LRMs show advantages, and high-complexity tasks where all models collapse. Apple has begun integrating powerful generative AI into its own apps and experiences. The new Foundation Models framework gives app developers access to the on-device foundation language model.

Recommended read:
References :
  • THE DECODER: LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes.
  • machinelearning.apple.com: Apple machine learning discusses Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
  • PPC Land: PPC Land reports on Apple study exposes fundamental limits in AI reasoning models through puzzle tests.
  • the-decoder.com: The Decoder covers Apple's study, highlighting the limitation in thinking abilities of reasoning models.
  • felloai.com: In a breakthrough paper, Apple researchers reveal the uncomfortable truth about large reasoning models (LRMs): their internal “thought processes” might be nothing more than performative illusions.
  • Gadgets 360: Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems
  • futurism.com: Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry
  • The Register - Software: Apple AI boffins puncture AGI hype as reasoning models flail on complex planning
  • www.theguardian.com: Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds
  • chatgptiseatingtheworld.com: Apple researchers cast doubt on AI reasoning models of other companies
  • www.livescience.com: AI reasoning models aren’t as smart as they were cracked up to be, Apple study claims
  • www.computerworld.com: Apple warns: GenAI still isn’t very smart
  • Fello AI: Apple's research paper, "The Illusion of Thinking," argues that large reasoning models face a complete accuracy collapse beyond certain complexities, highlighting limitations in their reasoning capabilities.
  • WIRED: Apple's research paper challenges the claims of significant reasoning capabilities in current AI models, particularly those relying on pattern matching instead of genuine understanding.
  • Analytics Vidhya: Apple Exposes Reasoning Flaws in o3, Claude, and DeepSeek-R1
  • www.itpro.com: ‘A complete accuracy collapse’: Apple throws cold water on the potential of AI reasoning – and it's a huge blow for the likes of OpenAI, Google, and Anthropic
  • www.tomshardware.com: Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
  • Digital Information World: Apple study questions AI reasoning models in stark new report
  • www.theguardian.com: A research paper by Apple has taken the AI world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably.
  • AI Alignment Forum: Researchers at Apple released a paper provocatively titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexityâ€, which “challenge[s] prevailing assumptions about [language model] capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoningâ€.
  • Ars OpenForum: New Apple study challenges whether AI models truly “reason†through problems
  • 9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study
  • AI News | VentureBeat: Do reasoning models really “think†or not? Apple research sparks lively debate, response
  • www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • MarkTechPost: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
  • 9to5mac.com: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

Chris McKay@Maginative //
Google's AI research notebook, NotebookLM, has introduced a significant upgrade that enhances collaboration by allowing users to publicly share their AI-powered notebooks with a simple link. This new feature, called Public Notebooks, enables users to share their research summaries and audio overviews generated by AI with anyone, without requiring sign-in or permissions. This move aims to transform NotebookLM from a personal research tool into an interactive, AI-powered knowledge hub, facilitating easier distribution of study guides, project briefs, and more.

The public sharing feature provides viewers with the ability to interact with AI-generated content like FAQs and overviews, as well as ask questions in chat. However, they cannot edit the original sources, ensuring the preservation of ownership while enabling discovery. To share a notebook, users can click the "Share" button, switch the setting to "Anyone with the link," and copy the link. This streamlined process is similar to sharing Google Docs, making it intuitive and accessible for users.

This upgrade is particularly beneficial for educators, startups, and nonprofits. Teachers can share curated curriculum summaries, startups can distribute product manuals, and nonprofits can publish donor briefing documents without the need to build a dedicated website. By enabling easier sharing of AI-generated notes and audio overviews, Google is demonstrating how generative AI can be integrated into everyday productivity workflows, making NotebookLM a more grounded tool for sense-making of complex material.

Recommended read:
References :
  • Maginative: Google’s NotebookLM Now Lets You Share AI-Powered Notebooks With a Link
  • The Official Google Blog: NotebookLM is adding a new way to share your own notebooks publicly.
  • PCMag Middle East ai: Google Makes It Easier to Share Your NotebookLM Docs, AI Podcasts
  • AI & Machine Learning: How Alpian is redefining private banking for the digital age with gen AI
  • venturebeat.com: Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud
  • TestingCatalog: Google’s Kingfall model briefly goes live on AI Studio before lockdown
  • shellypalmer.com: NotebookLM, one of Google's most viral AI products, just got a really useful upgrade: users can now publicly share notebooks with a link.

@techhq.com //
References: TechHQ , futurumgroup.com
Dell Technologies and NVIDIA are collaborating to construct the "Doudna" supercomputer for the U.S. Department of Energy (DOE). Named after Nobel laureate Jennifer Doudna, the system will be housed at the Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center (NERSC) and is slated for deployment in 2026. This supercomputer aims to revolutionize scientific research by merging artificial intelligence (AI) with simulation capabilities, empowering over 11,000 researchers across various disciplines, including fusion energy, astronomy, and life sciences. The project represents a significant federal investment in high-performance computing (HPC) infrastructure, designed to maintain U.S. leadership in AI and scientific discovery.

The Doudna supercomputer, also known as NERSC-10, promises a tenfold increase in scientific output compared to its predecessor, Perlmutter, while only consuming two to three times the power. This translates to a three-to-five-fold improvement in performance per watt, achieved through advanced chip design and system-level efficiencies. The system integrates high-performance CPUs with coherent GPUs, enabling direct data access and sharing across processors, addressing traditional bottlenecks in scientific computing workflows. Doudna will also be connected to DOE experimental and observational facilities through the Energy Sciences Network (ESnet), facilitating seamless data streaming and near real-time analysis.

According to NVIDIA CEO Jensen Huang, Doudna is designed to accelerate scientific workflows and act as a "time machine for science," compressing years of discovery into days. Energy Secretary Chris Wright sees it as essential infrastructure for maintaining American technological leadership in AI and quantum computing. The supercomputer emphasizes coherent memory access between CPUs and GPUs, enabling data sharing in heterogeneous processors, which is a requirement for modern AI-accelerated scientific workflows. The Nvidia Vera Rubin supercomputer architecture introduces hardware-level optimizations designed specifically for the convergence of simulation, machine learning, and quantum algorithm development.

Recommended read:
References :
  • TechHQ: Nvidia Vera Rubin supercomputer to serve researchers in fusion energy, astronomy, and life sciences. Dell’s system targets 10x performance, 3-5x better power efficiency, to be deployed in 2026.
  • futurumgroup.com: Can Dell and NVIDIA’s AI Factory 2.0 Solve Enterprise-Scale AI Infrastructure Gaps?