DeeperML - News about #safety

@the-decoder.com //

Advancements in AI Control and Safety Research

Recent developments in AI safety research were highlighted at the Singapore Conference on AI in April 2025, where over 100 experts from eleven countries convened to establish shared priorities for ensuring the technical safety of AI systems. The "Singapore Consensus on Global AI Safety Research Priorities" emerged from this meeting, focusing on general-purpose AI (GPAI) systems, including language models, multimodal models, and autonomous AI agents. The report strategically avoids political questions, concentrating instead on the technical aspects of AI safety research. The primary objective is to foster a "trusted ecosystem" that promotes AI innovation while proactively addressing potential societal risks.

The consensus report divides technical AI safety research into three critical areas: risk assessment, building trustworthy systems, and post-deployment control. Risk assessment involves developing methods for measuring and predicting risks associated with AI, including standardized audit techniques, benchmarks for identifying dangerous capabilities, and assessing social impacts. A key challenge identified is the "evidence dilemma," balancing the need for concrete evidence of risks against the potential for those risks to escalate rapidly. The report advocates for prospective risk analysis, similar to techniques used in nuclear safety and aviation, to proactively identify and mitigate potential dangers.

Other research focuses on enhancing the capabilities of Language Models (LLMs) through methods like reinforcement learning (RL) and improved memory management. One advancement, RL^V, unifies reasoning and verification in LLMs without compromising training scalability, using the LLM's generative capabilities to act as both a reasoner and a verifier. Additionally, recursive summarization is being explored as a way to enable long-term dialog memory in LLMs, allowing them to maintain consistent and coherent conversations by continuously updating their understanding of past interactions. These advancements address key limitations in current AI systems, such as inconsistent recall and the ability to verify the accuracy of their reasoning.

References :

the-decoder.com: 100 experts call for more research into the control of AI systems
www.marktechpost.com: RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

Classification:

HashTags: #AI #Safety #Research
Target: AI Systems
Product: Various AI Models
Feature: AI Safety
Type: Research
Severity: Informative

@Google DeepMind Blog //

DeepMind Focuses on AI Governance and Security

Google DeepMind is intensifying its focus on AI governance and security as it ventures further into artificial general intelligence (AGI). The company is exploring AI monitors to regulate hyperintelligent AI models, splitting potential threats into four categories, with the creation of a "monitor" AI being one proposed solution. This proactive approach includes prioritizing technical safety, conducting thorough risk assessments, and fostering collaboration within the broader AI community to navigate the development of AGI responsibly.

DeepMind's reported clampdown on sharing research will stifle AI innovation, warns the CEO of Iris.ai, one of Europe’s leading startups in the space, Anita Schjøll Abildgaard. Concerns are rising within the AI community that DeepMind's new research restrictions threaten AI innovation. The CEO of Iris.ai, a Norwegian startup developing an AI-powered engine for science, warns the drawbacks will far outweigh the benefits. She fears DeepMind's restrictions will hinder technological advances.

References :

Google DeepMind Blog: Weâ€™re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.
The Next Web: Google DeepMindâ€™s reported clampdown on sharing research will stifle AI innovation, warns the CEO of Iris.ai, one of Europeâ€™s leading startups in the space.
www.techrepublic.com: DeepMindâ€™s approach to AGI safety and security splits threats into four categories. One solution could be a â€œmonitorâ€ AI.
AI Alignment Forum: DeepMind: An Approach to Technical AGI Safety and Security

Classification:

HashTags: #AISafety #DeepMind #AGI
Company: DeepMind
Target: AI Community
Product: Gemini
Feature: Safety
Type: AI
Severity: Informative

News from the AI & ML world

DeeperML - #safety

Advancements in AI Control and Safety Research

Classification:

DeepMind Focuses on AI Governance and Security

Classification:

Benchmarks

Blogs

Research Tools