Gemini Robotics 1.5 brings AI agents into the physical world

Earlier this year, we made incredible progress bringing Gemini's multimodal understanding into the physical world, starting with the Gemini Robotics family of models.

Today, we’re taking another step towards advancing intelligent, truly general-purpose robots. We're introducing two models that unlock agentic experiences with advanced thinking:

Gemini Robotics 1.5 – Our most capable vision-language-action (VLA) model turns visual information and instructions into motor commands for a robot to perform a task. This model thinks before taking action and shows its process, helping robots assess and complete complex tasks more transparently. It also learns across embodiments, accelerating skill learning.
Gemini Robotics-ER 1.5 – Our most capable vision-language model (VLM) reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission. This model now achieves state-of-the-art performance across spatial understanding benchmarks.

These advances will help developers build more capable and versatile robots that ...

Copyright of this story solely belongs to deepmind.com . To see the full text click HERE

Share: