News from the AI & ML world

DeeperML - #research

@the-decoder.com //
Recent developments in AI safety research were highlighted at the Singapore Conference on AI in April 2025, where over 100 experts from eleven countries convened to establish shared priorities for ensuring the technical safety of AI systems. The "Singapore Consensus on Global AI Safety Research Priorities" emerged from this meeting, focusing on general-purpose AI (GPAI) systems, including language models, multimodal models, and autonomous AI agents. The report strategically avoids political questions, concentrating instead on the technical aspects of AI safety research. The primary objective is to foster a "trusted ecosystem" that promotes AI innovation while proactively addressing potential societal risks.

The consensus report divides technical AI safety research into three critical areas: risk assessment, building trustworthy systems, and post-deployment control. Risk assessment involves developing methods for measuring and predicting risks associated with AI, including standardized audit techniques, benchmarks for identifying dangerous capabilities, and assessing social impacts. A key challenge identified is the "evidence dilemma," balancing the need for concrete evidence of risks against the potential for those risks to escalate rapidly. The report advocates for prospective risk analysis, similar to techniques used in nuclear safety and aviation, to proactively identify and mitigate potential dangers.

Other research focuses on enhancing the capabilities of Language Models (LLMs) through methods like reinforcement learning (RL) and improved memory management. One advancement, RL^V, unifies reasoning and verification in LLMs without compromising training scalability, using the LLM's generative capabilities to act as both a reasoner and a verifier. Additionally, recursive summarization is being explored as a way to enable long-term dialog memory in LLMs, allowing them to maintain consistent and coherent conversations by continuously updating their understanding of past interactions. These advancements address key limitations in current AI systems, such as inconsistent recall and the ability to verify the accuracy of their reasoning.

Recommended read:
References :
  • the-decoder.com: 100 experts call for more research into the control of AI systems
  • www.marktechpost.com: RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

Carl Franzen@AI News | VentureBeat //
Microsoft has announced the release of Phi-4-reasoning-plus, a new small, open-weight language model designed for advanced reasoning tasks. Building upon the architecture of the previously released Phi-4, this 14-billion parameter model integrates supervised fine-tuning and reinforcement learning to achieve strong performance on complex problems. According to Microsoft, the Phi-4 reasoning models outperform larger language models on several demanding benchmarks, despite their compact size. This new model pushes the limits of small AI, demonstrating that carefully curated data and training techniques can lead to impressive reasoning capabilities.

The Phi-4 reasoning family, consisting of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, is specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Phi-4-reasoning-plus, in particular, extends supervised fine-tuning with outcome-based reinforcement learning, which is targeted for improved performance in high-variance tasks such as competition-level mathematics. All models are designed to enable reasoning capabilities, especially on lower-performance hardware such as mobile devices.

Microsoft CEO Satya Nadella revealed that AI is now contributing to 30% of Microsoft's code. The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access. The model has been released under a permissive MIT license, enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction.

Recommended read:
References :
  • the-decoder.com: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • MarkTechPost: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • THE DECODER: Microsoft's Phi-4-reasoning models outperform larger models and run on your laptop or phone
  • AI News | VentureBeat: The release demonstrates that with carefully curated data and training techniques, small models can deliver strong reasoning performance.
  • Maginative: Microsoft’s Phi-4 Reasoning Models Push the Limits of Small AI
  • www.marktechpost.com: Microsoft AI Released Phi-4-Reasoning: A 14B Parameter Open-Weight Reasoning Model that Achieves Strong Performance on Complex Reasoning Tasks
  • www.tomshardware.com: Microsoft's CEO reveals that AI writes up to 30% of its code — some projects may have all of its code written by AI
  • Ken Yeung: Microsoft’s New Phi-4 Variants Show Just How Far Small AI Can Go
  • www.tomsguide.com: Microsoft just unveiled new Phi-4 reasoning AI models — here's why they're a big deal
  • Techzine Global: Microsoft is launching three new advanced small language models as an extension of the Phi series. These models have reasoning capabilities that enable them to analyze and answer complex questions effectively.
  • Analytics Vidhya: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.analyticsvidhya.com: Microsoft Launches Two Powerful Phi-4 Reasoning Models
  • www.windowscentral.com: Microsoft Introduces Phi-4 Reasoning SLM Models — Still "Making Big Leaps in AI" While Its Partnership with OpenAI Frays
  • Towards AI: Phi-4 Reasoning Models
  • the-decoder.com: Microsoft's Phi 4 responds to a simple "Hi" with 56 thoughts
  • Data Phoenix: Microsoft has introduced three new small language models—Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.
  • AI News: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—that reportedly deliver complex reasoning capabilities comparable to much larger models while maintaining efficiency for deployment across various computing environments.

@learn.aisingapore.org //
References: LearnAI , news.mit.edu , techxplore.com ...
MIT researchers have achieved a breakthrough in artificial intelligence, specifically aimed at enhancing the accuracy of AI-generated code. This advancement focuses on guiding large language models (LLMs) to produce outputs that strictly adhere to the rules and structures of various programming languages, preventing common errors that can cause system crashes. The new technique, developed by MIT and collaborators, ensures that the AI's focus remains on generating valid and accurate code by quickly discarding less promising outputs. This approach not only improves code quality but also significantly boosts computational efficiency.

This efficiency gain allows smaller LLMs to perform better than larger models in producing accurate and well-structured outputs across diverse real-world scenarios, including molecular biology and robotics. The new method tackles issues with existing methods which distort the model’s intended meaning or are too time-consuming for complex tasks. Researchers developed a more efficient way to control the outputs of a large language model, guiding it to generate text that adheres to a certain structure, like a programming language, and remains error free.

The implications of this research extend beyond academic circles, potentially revolutionizing programming assistants, AI-driven data analysis, and scientific discovery tools. By enabling non-experts to control AI-generated content, such as business professionals creating complex SQL queries using natural language prompts, this architecture could democratize access to advanced programming and data manipulation. The findings will be presented at the International Conference on Learning Representations.

Recommended read:
References :
  • LearnAI: Making AI-generated code more accurate in any language | MIT News Programmers can now use large language models (LLMs) to generate computer code more quickly. However, this only makes programmers’ lives easier if that code follows the rules of the programming language and doesn’t cause a computer to crash.
  • news.mit.edu: A new technique automatically guides an LLM toward outputs that adhere to the rules of whatever programming language or other format is being used.
  • learn.aisingapore.org: Making AI-generated code more accurate in any language | MIT News
  • techxplore.com: Making AI-generated code more accurate in any language

Maximilian Schreiner@THE DECODER //
Anthropic has announced major updates to its AI assistant, Claude, introducing both an autonomous research capability and Google Workspace integration. These enhancements are designed to transform Claude into a more versatile tool, particularly for enterprise users, and directly challenge OpenAI and Microsoft in the competitive market for AI productivity tools. The new "Research" feature allows Claude to conduct systematic, multi-step investigations across internal work contexts and the web. It operates autonomously, performing iterative searches to explore various angles of a query and resolve open questions, ensuring thorough answers supported by citations.

Anthropic's Google Workspace integration expands Claude's ability to interact with Gmail, Calendar, and Google Docs. By securely accessing emails, calendar events, and documents, Claude can compile meeting notes, extract action items from email threads, and search relevant files without manual uploads or repeated context-setting. This functionality is designed to benefit diverse user groups, from marketing and sales teams to engineers and students, by streamlining workflows and enhancing productivity. For Enterprise plan administrators, Anthropic also offers an additional Google Docs cataloging function that uses retrieval augmented generation techniques to index organizational documents securely.

The Research feature is currently available in early beta for Max, Team, and Enterprise plans in the United States, Japan, and Brazil, while the Google Workspace integration is available in beta for all paid users globally. Anthropic emphasizes that these updates are part of an ongoing effort to make Claude a robust collaborative partner. The company plans to expand the range of available content sources and give Claude the ability to conduct even more in-depth research in the coming weeks. With its focus on enterprise-grade security and speed, Anthropic is betting that Claude's ability to deliver quick and well-researched answers will win over busy executives.

Recommended read:
References :
  • analyticsindiamag.com: Anthropic Releases New Research Feature for Claude
  • venturebeat.com: Claude just gained superpowers: Anthropic’s AI can now search your entire Google Workspace without you
  • TestingCatalog: Anthropic begins testing voice mode with three voices in Claude App
  • www.tomsguide.com: Anthropic’s AI assistant can now pull insights from Gmail, Calendar, and Docs—plus conduct in-depth research—freeing professionals from tedious tasks.
  • THE DECODER: Anthropic's AI assistant Claude gets agent-based research and Google Workspace integration
  • Analytics India Magazine: The company also announced Google Workspace integrations for Claude.
  • TestingCatalog: Discover Claude's new Research and Google Workspace integration features, enhancing AI-driven investigations and seamless productivity. Available in beta for select plans.
  • www.computerworld.com: Anthropic’s Claude AI can now search through your Gmail account for ‘Research’
  • gHacks Technology News: Claude AI gets Research Mode and Google Workspace integration
  • Maginative: Anthropic has added Research and Google Workspace integration to Claude, positioning it more directly as a workplace AI assistant that can dig into your files, emails, and the web to deliver actionable insights.
  • the-decoder.com: Anthropic's AI assistant Claude gets agent-based research and Google Workspace integration
  • www.ghacks.net: Claude AI gets Research Mode and Google Workspace integration
  • www.techradar.com: I tried Claude's new Research feature, and it's just as good as ChatGPT and Google Gemini's Deep Research features
  • www.marktechpost.com: Anthropic Releases a Comprehensive Guide to Building Coding Agents with Claude Code

@sciencedaily.com //
Recent advancements in quantum computing research have yielded promising results. Researchers at the University of the Witwatersrand in Johannesburg, along with collaborators from Huzhou University in China, have discovered a method to shield quantum information from environmental disruptions, potentially leading to more reliable quantum technologies. This breakthrough involves manipulating quantum wave functions to preserve quantum information, which could enhance medical imaging, improve AI diagnostics, and strengthen data security by providing ultra-secure communication.

UK startup Phasecraft has announced a new algorithm, THRIFT, that improves the ability of quantum computers to model new materials and chemicals by a factor of 10. By optimizing quantum simulation, THRIFT enables scientists to model new materials and chemicals faster and more accurately, even on today’s slower machines. Furthermore, Oxford researchers have demonstrated a 25-nanosecond controlled-Z gate with 99.8% fidelity, combining high speed and accuracy in a simplified superconducting circuit. This achievement advances fault-tolerant quantum computing by improving raw gate performance without relying heavily on error correction or added hardware.

Recommended read:
References :
  • The Quantum Insider: Oxford Researchers Demonstrate Fast, 99.8% Fidelity Two-Qubit Gate Using Simplified Circuit Design
  • www.sciencedaily.com: Researchers find a way to shield quantum information from 'noise'
  • Bernard Marr: Quantum computing is poised to revolutionize industries from drug development to cybersecurity, with the global market projected to reach $15 billion by 2030.
  • The Quantum Insider: A new study demonstrates that a digital quantum computer can simulate magnetic behavior at scales and timescales that challenge the best classical methods, opening a path toward practical quantum advantage in materials science.
  • phys.org: Quantum statistical approach quiets big, noisy data

Matt Marshall@AI News | VentureBeat //
Microsoft is enhancing its Copilot Studio platform with AI-driven improvements, introducing deep reasoning capabilities that enable agents to tackle intricate problems through methodical thinking and combining AI flexibility with deterministic business process automation. The company has also unveiled specialized deep reasoning agents for Microsoft 365 Copilot, named Researcher and Analyst, to help users achieve tasks more efficiently. These agents are designed to function like personal data scientists, processing diverse data sources and generating insights through code execution and visualization.

Microsoft's focus includes securing AI and using it to bolster security measures, as demonstrated by the upcoming Microsoft Security Copilot agents and new security features. Microsoft aims to provide an AI-first, end-to-end security platform that helps organizations secure their future, one example being the AI agents designed to autonomously assist with phishing, data security, and identity management. The Security Copilot tool will automate routine tasks, allowing IT and security staff to focus on more complex issues, aiding in defense against cyberattacks.

Recommended read:
References :
  • Microsoft Security Blog: Learn about the upcoming availability of Microsoft Security Copilot agents and other new offerings for a more secure AI future.
  • www.zdnet.com: Designed for Microsoft's Security Copilot tool, the AI-powered agents will automate basic tasks, freeing IT and security staff to tackle more complex issues.

@phys.org //
Recent mathematical research is pushing the boundaries of theoretical understanding across various domains. One area of focus involves solving the least squares problem, particularly with rank constraints. A specific problem involves minimizing a function with a rank constraint and the quest for efficient solutions to these constrained optimization challenges remains a significant area of investigation.

This also involves a three-level exploration into a "mathematics-driven universe," questioning whether math is discovered or invented, and delving into the philosophical implications of mathematics in modern physics. Furthermore, mathematicians are employing topology to investigate the shape of the universe. This includes exploring possible 2D and 3D spaces to better understand the cosmos we inhabit, hinting at intriguing and surprising possibilities that could change our understanding of reality.

Recommended read:
References :
  • mathoverflow.net: This article focuses on solving the least square problem
  • medium.com: This article is a three-level journey into a mathematics-driven universe