Anthropic published the prompt injection failure rates that enterprise security teams have been asking every vendor for

2 hours ago venturebeat

Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. Move that same attack to a GUI-based system with extended thinking enabled, and the picture changes fast. A single attempt gets through 17.8% of the time without safeguards. By the 200th attempt, the breach rate hits 78.6% without safeguards and 57.1% with them.

The latest models’ 212-page system card, released February 5, breaks out attack success rates by surface, by attempt count, and by safeguard configuration.

Why surface-level differences determine enterprise risk

For years, prompt injection was a known risk that no one quantified. Security teams treated it as theoretical. AI developers treated it as a research problem. That changed when Anthropic made prompt injection measurable across four distinct agent surfaces, with attack success rates that security leaders ...

Copyright of this story solely belongs to venturebeat . To see the full text click HERE

Why surface-level differences determine enterprise risk

Share: