ExclusiveAnthropic Research Memo Shows Focus on Rogue Agents, Scheming Models
Puncturing the buzz over AI agents such as Anthropic’s Claude Code and the open-source project OpenClaw is the prospect that these agents could get tricked into revealing sensitive information such as a person’s banking information. In a sign of those concerns, Anthropic earlier this year singled out rogue agents as a topic of focus for its research fellows. Anthropic’s staff proposed that the fellows train an agent to misbehave in certain circumstances—say, by writing code with...