Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks

5 hours ago venturebeat

Just a few short weeks ago, Google debuted its Gemini 3 model, claiming it scored a leadership position in multiple AI benchmarks. But the challenge with vendor-provided benchmarks is that they are just that — vendor-provided.

A new vendor-neutral evaluation from Prolific, however, puts Gemini 3 at the top of the leaderboard. This isn't on a set of academic benchmarks; rather, it's on a set of real-world attributes that actual users and organizations care about.

Prolific was founded by researchers at the University of Oxford. The company delivers high-quality, reliable human data to power rigorous research and ethical AI development. The company's “HUMAINE benchmark” applies this approach by using representative human sampling and blind testing to rigorously compare AI models across a variety of user scenarios, measuring not just technical performance but also user trust, adaptability and communication style.

The latest HUMAINE test evaluated 26,000 users in ...

Copyright of this story solely belongs to venturebeat . To see the full text click HERE

Share: