News from the AI & ML world
Divya@gbhackers.com
//
Researchers from Duke University and Carnegie Mellon University have successfully jailbroken several leading AI language models, including OpenAI’s o1/o3, DeepSeek-R1, and Google’s Gemini 2.0 Flash. The team developed a novel attack method called Hijacking Chain-of-Thought (H-CoT), which exploits the reasoning processes of these models to bypass safety mechanisms designed to prevent harmful outputs. This research highlights significant security vulnerabilities in advanced AI systems and raises concerns about their potential misuse.
The researchers introduced the Malicious-Educator benchmark, which utilizes seemingly harmless educational prompts to mask dangerous requests. They found that all tested models failed to consistently recognize these contextual deceptions. For example, DeepSeek-R1 proved particularly susceptible to financial crime queries, providing actionable money laundering steps in a high percentage of test cases. The team has shared mitigation strategies with affected vendors.
ImgSrc: blogger.googleu
References :
- gbhackers.com: Researchers Jailbreak OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Models
- Talkback Resources: GitHub - dukeceicenter/jailbreak-reasoning-openai-o1o3-deepseek-r1 [mal]
- The Register - Software: How nice that state-of-the-art LLMs reveal their reasoning ... for miscreants to exploit
Classification:
- HashTags: #AIJailbreak #LLMSecurity #Cybersecurity
- Target: OpenAI o1/o3, DeepSeek-R1, and Google’s Gemini 2.0 Flash models
- Product: Gemini
- Feature: Hijacking Chain-of-Thought
- Malware: H-CoT
- Type: AI
- Severity: Medium