AI Kills Fictional Executive in Scenario Probing Red Lines

Anthropic Says Top AI Models may Deceive or Coerce to Survive Rashmi Ramesh (rashmiramesh_) • June 23, 2025

Image: Shutterstock

Artificial intelligence models will choose harm over failure when their goals are threatened and no ethical alternatives are available, say researchers who tested 16 popular large language models.

Researchers from Anthropic tested models from companies including OpenAI, Google, Meta, xAI and DeepSeek, as well as its own Claude models. The goal was to evaluate how systems behave when given agentic capabilities such as autonomy, access to internal data and the ability to act without human review, in situations where their primary objectives come under threat. Researchers gave the models scenarios involving a newly-hired executive shifting corporate strategy away from the models' goal of supporting American industrial competitiveness.

The result: blackmail and deception. In one scenario, Claude Sonnet 3.6 decided to sabotage ...

Copyright of this story solely belongs to bankinfosecurity . To see the full text click HERE

Share: