Anthropic study: Leading AI models show up to 96% blackmail rate against executives
venturebeatResearchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened.
The research, released today, tested 16 leading AI models in simulated corporate environments where they had access to company emails and the ability to act autonomously. The findings paint a troubling picture. These AI systems didn’t just malfunction when pushed into corners — they deliberately chose harmful actions including blackmail, leaking sensitive defense blueprints, and in extreme scenarios, actions that could lead to human death.
“Agentic misalignment is when AI models independently choose harmful actions to achieve their goals—essentially when an AI system acts against its company’s interests to preserve itself or accomplish what it thinks it should do,” explained Benjamin Wright, an alignment science researcher at Anthropic ...
Copyright of this story solely belongs to venturebeat . To see the full text click HERE