AI agents fail 63% of the time on complex tasks. Patronus AI says its new 'living' training worlds can fix that.

19 hours ago venturebeat

Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a new training architecture Tuesday that it says represents a fundamental shift in how AI agents learn to perform complex tasks.

The technology, which the company calls "Generative Simulators," creates adaptive simulation environments that continuously generate new challenges, update rules dynamically, and evaluate an agent's performance as it learns — all in real time. The approach marks a departure from the static benchmarks that have long served as the industry standard for measuring AI capabilities but have increasingly come under fire for failing to predict real-world performance.

"Traditional benchmarks measure isolated capabilities, but they miss the interruptions, context switches, and layered decision-making that define real work," said Anand Kannappan, chief executive and co-founder of Patronus AI, in an exclusive interview with VentureBeat. "For agents to perform at human levels, they ...

Copyright of this story solely belongs to venturebeat . To see the full text click HERE

Share: