OpenAI, Anthropic Swap Safety Reviews
bankinfosecurityAI Giants Evaluated Each Other's Newer Models for Safety Risks Rashmi Ramesh (rashmiramesh_) • August 28, 2025

OpenAI and Anthropic swapped artificial intelligence models evaluations over the summer, testing the other company's models for behaviors that could indicate misalignment risks. The companies released their findings simultaneously, finding that no model was severely problematic, but that all demonstrated troubling behaviors in artificial testing scenarios.
See Also: Post-Quantum Cryptography - A Fundamental Pillar in the Future of Cybersecurity [ES]
The exercise involved OpenAI testing Anthropic's Claude Opus 4 and Claude Sonnet 4 models, while Anthropic evaluated OpenAI's GPT-4o, GPT-4.1, o3 and o4-mini models. Both companies disabled some safety filters.
The tests focused on "agentic misalignment evaluations," which involved placing AI systems in simulated scenarios with significant autonomy to observe behavior under stress conditions that might reveal alignment issues.
Auto-grading was unreliable in many cases, with both companies ...
Copyright of this story solely belongs to bankinfosecurity . To see the full text click HERE