News from the AI & ML world

DeeperML - #safety

@Google DeepMind Blog //
Google DeepMind is intensifying its focus on AI governance and security as it ventures further into artificial general intelligence (AGI). The company is exploring AI monitors to regulate hyperintelligent AI models, splitting potential threats into four categories, with the creation of a "monitor" AI being one proposed solution. This proactive approach includes prioritizing technical safety, conducting thorough risk assessments, and fostering collaboration within the broader AI community to navigate the development of AGI responsibly.

DeepMind's reported clampdown on sharing research will stifle AI innovation, warns the CEO of Iris.ai, one of Europe’s leading startups in the space, Anita Schjøll Abildgaard. Concerns are rising within the AI community that DeepMind's new research restrictions threaten AI innovation. The CEO of Iris.ai, a Norwegian startup developing an AI-powered engine for science, warns the drawbacks will far outweigh the benefits. She fears DeepMind's restrictions will hinder technological advances.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • Google DeepMind Blog: We’re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.
  • The Next Web: Google DeepMind’s reported clampdown on sharing research will stifle AI innovation, warns the CEO of Iris.ai, one of Europe’s leading startups in the space.
  • www.techrepublic.com: DeepMind’s approach to AGI safety and security splits threats into four categories. One solution could be a “monitor†AI.
  • AI Alignment Forum: DeepMind: An Approach to Technical AGI Safety and Security
Classification:
@blogs.microsoft.com //
Anthropic, Google DeepMind, and OpenAI are at the forefront of developing AI agents with the ability to interact with computers in a human-like manner. These agents are designed to perform a range of tasks, including web searches, form completion, and button clicks, enabling them to order groceries, request rides, or book flights. The models employ chain-of-thought reasoning to decompose complex instructions into manageable steps, requesting user input when necessary and seeking confirmation before executing final actions.

To address safety concerns such as prompt injection attacks, developers are implementing restrictions, such as preventing the agents from logging into websites or entering payment information. Anthropic was the first to unveil this functionality in October, with its Claude chatbot now capable of "using computers the way humans do." Google DeepMind is developing Mariner, built on top of Google’s Gemini 2 language model and OpenAI launched its computer-use agent (CUA), called Operator.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • IEEE Spectrum: IEEE Spectrum discusses the development of AI agents that can use computers like humans, highlighting models from Anthropic, Google DeepMind, and OpenAI.
  • IEEE Spectrum: Article discussing OpenAI's computer-use agent, called Operator, and its ability to work with websites.
  • www.anthropic.com: Anthropic was the first to unveil this new functionality, with an announcement in October that its Claude chatbot can now “use computers the way humans do.â€�
Classification:
@singularityhub.com //
OpenAI models, including the recently released GPT-4o, are facing scrutiny due to their vulnerability to "jailbreaks." Researchers have demonstrated that targeted attacks can bypass the safety measures implemented in these models, raising concerns about their potential misuse. These jailbreaks involve manipulating the models through techniques like "fine-tuning," where models are retrained to produce responses with malicious intent, effectively creating an "evil twin" capable of harmful tasks. This highlights the ongoing need for further development and robust safety measures within AI systems.

The discovery of these vulnerabilities poses significant risks for applications relying on the safe behavior of OpenAI's models. The concern is that, as AI capabilities advance, the potential for harm may outpace the ability to prevent it. This risk is particularly urgent as open-weight models, once released, cannot be recalled, underscoring the need to collectively define an acceptable risk threshold and take action before that threshold is crossed. A bad actor could disable safeguards and create the “evil twin” of a model: equally capable, but with no ethical or legal bounds.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • www.artificialintelligence-news.com: Recent research has highlighted potential vulnerabilities in OpenAI models, demonstrating that their safety measures can be bypassed by targeted attacks. These findings underline the ongoing need for further development in AI safety systems.
  • www.datasciencecentral.com: OpenAI models, although advanced, are not completely secure from manipulation and potential misuse. Researchers have discovered vulnerabilities that can be exploited to retrain models for malicious purposes, highlighting the importance of ongoing research in AI safety.
  • Blog (Main): OpenAI models have been found vulnerable to manipulation through "jailbreaks," prompting concerns about their safety and potential misuse in malicious activities. This poses a significant risk for applications relying on the models’ safe behavior.
  • SingularityHub: This article discusses Anthropic's new system for defending against AI jailbreaks and its successful resistance to hacking attempts.
Classification: