Surprisingly enough, it seems some AI agents aren't quite up to scratch on some basic business tests

(Image credit: Shutterstock / NicoElNino)

Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
Reasoning models like gemini-2.5-pro tend to outperform lighter models
CRMArena-Pro has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.

Although models like gemini-2.5-pro achieved over 83% success in workflow execution, the Salesforce researchers still highlighted some concerns with AI agents, suggesting they might not quite be up to scratch after all.

How businesses can take advantage of the AI agent boom

Run revenue smarter: how agentic AI unlocks enterprise growth ...
Copyright of this story solely belongs to techradar.com . To see the full text click HERE

Share: