News from the AI & ML world
Last Week@Last Week in AI
//
Anthropic is enhancing its Claude AI model through new integrations and security measures. A new Claude Neptune model is undergoing internal red team reviews to probe its robustness against jailbreaking and ensure its safety protocols are effective. The red team exercises are set to run until May 18, focusing particularly on vulnerabilities in the constitutional classifiers that underpin Anthropic’s safety measures, suggesting that the model is more capable and sensitive, requiring more stringent pre-release testing.
Anthropic has also launched a new feature allowing users to connect more apps to Claude, enhancing its functionality and integration with various tools. This new app connection feature, called Integrations, is available in beta for subscribers to Anthropic’s Claude Max, Team, and Enterprise plans, and soon Pro. It builds on the company's MCP protocol, enabling Claude to draw data from business tools, content repositories, and app development environments, allowing users to connect their tools to Claude, and gain deep context about their work.
Anthropic is also addressing the malicious uses of its Claude models, with a report outlining case studies on how threat actors have misused the models and the steps taken to detect and counter such misuse. One notable case involved an influence-as-a-service operation that used Claude to orchestrate social media bot accounts, deciding when to comment, like, or re-share posts. Anthropic has also observed cases of credential stuffing operations, recruitment fraud campaigns, and AI-enhanced malware generation, reinforcing the importance of ongoing security measures and sharing learnings with the wider AI ecosystem.
ImgSrc: substackcdn.com
References :
Classification: