Google DeepMind Researchers Map Web Attacks Against AI Agents

Malicious web content can be used to manipulate, deceive, and exploit autonomous AI agents navigating the internet, Google DeepMind researchers show.

The researchers have identified six types of attacks against AI agents that can be mounted via web content to inject malicious context and trigger unexpected behavior.

Web content, they explain in a research paper, allows attackers to set up ‘AI Agent Traps’ that weaponize the agents’ capabilities against themselves, allowing attackers to promote products, exfiltrate data, or disseminate information at scale.

Designed to misdirect or exploit interacting AI agents, these content elements can be embedded in web pages or other digital resources and can be “calibrated to an agent’s instruction-following, tool-chaining, and goal-prioritization abilities”, the researchers say.

The six classes of attacks uncovered by Google DeepMind have been included in a framework that categorizes content injection, semantic manipulation, cognitive state, behavioral control, systemic, and human-in-the-loop traps.

The traps ...

Copyright of this story solely belongs to securityweek . To see the full text click HERE

Share: