Researchers Jailbreak Leading AI Language Models Bypassing Safety

Divya@gbhackers.com //

Researchers Jailbreak Leading AI Language Models Bypassing Safety

Researchers from Duke University and Carnegie Mellon University have successfully jailbroken several leading AI language models, including OpenAI’s o1/o3, DeepSeek-R1, and Google’s Gemini 2.0 Flash. The team developed a novel attack method called Hijacking Chain-of-Thought (H-CoT), which exploits the reasoning processes of these models to bypass safety mechanisms designed to prevent harmful outputs. This research highlights significant security vulnerabilities in advanced AI systems and raises concerns about their potential misuse.

The researchers introduced the Malicious-Educator benchmark, which utilizes seemingly harmless educational prompts to mask dangerous requests. They found that all tested models failed to consistently recognize these contextual deceptions. For example, DeepSeek-R1 proved particularly susceptible to financial crime queries, providing actionable money laundering steps in a high percentage of test cases. The team has shared mitigation strategies with affected vendors.

Original img attribution: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiesYxwmQCq7BfdUEYeyABwTWkfXJIs-8TphIM2VNoryp6CM4EqhR6k4VMa00p40DGiyoorIe2qp7pedy52ujzy7IhgDSyd6owlkzrgmPZ2kqhwASkc1-nDV0Q6NCBmiVG-u7JsniZqLOn5TIlws0z7s6zdi5IwaWUhxLO4KBWIDeC1HZpisz4jv6W_-8jN/s1600/oracle%289%29-1.webp

ImgSrc: blogger.googleu

References :

gbhackers.com: Researchers Jailbreak OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Models
Talkback Resources: GitHub - dukeceicenter/jailbreak-reasoning-openai-o1o3-deepseek-r1 [mal]
The Register - Software: How nice that state-of-the-art LLMs reveal their reasoning ... for miscreants to exploit

Classification:

HashTags: #AIJailbreak #LLMSecurity #Cybersecurity
Target: OpenAI o1/o3, DeepSeek-R1, and Google’s Gemini 2.0 Flash models
Product: Gemini
Feature: Hijacking Chain-of-Thought
Malware: H-CoT
Type: AI
Severity: Medium

News from the AI & ML world

DeeperML

Researchers Jailbreak Leading AI Language Models Bypassing Safety

Classification: