Tech »  Topic »  Anthropic: All the major AI models will blackmail us if pushed hard enough

Anthropic: All the major AI models will blackmail us if pushed hard enough


Anthropic published research last week showing that all major AI models may resort to blackmail to avoid being shut down – but the researchers essentially pushed them into the undesired behavior through a series of artificial constraints that forced them into a binary decision.

The research explored a phenomenon they're calling agentic misalignment, the way in which AI agents might make harmful decisions. It did so following the release of the Claude 4 model family and the system card [PDF] detailing the models' technical characteristics, which mentioned the potential for coercive behavior under certain circumstances.

"When Anthropic released the system card for Claude 4, one detail received widespread attention: in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down," the company explained. "We're now sharing the full story behind that finding – and what it reveals about the potential for such risks across a variety ...


Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE