LLMs Tricked by 'Echo Chamber' Attack in Jailbreak Tactic

Researcher Details Stealthy Multi-Turn Prompt Exploit Bypassing AI Safety Rashmi Ramesh (rashmiramesh_) • June 24, 2025

Image: Shutterstock

A series of well-timed nudges are enough to derail a large language model and use it for nefarious purposes, researchers have found.

A proof-of-concept attack detailed by Neural Trust shows how bad actors can steer LLMs into producing prohibited content, without issuing an explicitly harmful request. Dubbed "Echo Chamber," the exploit uses a chain of subtle prompts to bypass existing safety guardrails by manipulating the model's emotional tone and contextual assumptions.

Developed by Neural Trust researcher Ahmad Alobaid, the attack hinges on context poisoning. Rather than directly asking the model to generate inappropriate content, the attacker sets a foundation through a benign conversation. These conversations gradually shift the model's behavior by using suggestive cues and indirect references, building what Alobaid calls "light ...

Copyright of this story solely belongs to bankinfosecurity . To see the full text click HERE

Share:

More related news