News from the AI & ML world

DeeperML

Ryan Daws@AI News //
Anthropic has unveiled groundbreaking insights into the 'AI biology' of their advanced language model, Claude. Through innovative methods, researchers have been able to peer into the complex inner workings of the AI, demystifying how it processes information and learns strategies. This research provides a detailed look at how Claude "thinks," revealing sophisticated behaviors previously unseen, and showing these models are more sophisticated than previously understood.

These new methods allowed scientists to discover that Claude plans ahead when writing poetry and sometimes lies, showing the AI is more complex than previously thought. The new interpretability techniques, which the company dubs “circuit tracing” and “attribution graphs,” allow researchers to map out the specific pathways of neuron-like features that activate when models perform tasks. This approach borrows concepts from neuroscience, viewing AI models as analogous to biological systems.

This research, published in two papers, marks a significant advancement in AI interpretability, drawing inspiration from neuroscience techniques used to study biological brains. Joshua Batson, a researcher at Anthropic, highlighted the importance of understanding how these AI systems develop their capabilities, emphasizing that these techniques allow them to learn many things they “wouldn’t have guessed going in.” The findings have implications for ensuring the reliability, safety, and trustworthiness of increasingly powerful AI technologies.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • venturebeat.com: Anthropic scientists expose how AI actually ‘thinks’ — and discover it secretly plans ahead and sometimes lies
  • AI News: Anthropic provides insights into the ‘AI biology’ of Claude
  • www.techrepublic.com: ‘AI Biology’ Research: Anthropic Looks Into How Its AI Claude ‘Thinks’
Classification: