New AI Jailbreak Bypasses Guardrails With Ease

1 day, 14 hours ago securityweek

New “Echo Chamber” attack bypasses advanced LLM safeguards by subtly manipulating conversational context, proving highly effective across leading AI models.

Through progressive poisoning and manipulating an LLM’s operational context, many leading AI models can be tricked into providing almost anything – regardless of the guardrails in place.

From their earliest days, LLMs have been susceptible to jailbreaks – attempts to get the gen-AI model to do something or provide information that could be harmful. The LLM developers have made jailbreaks more difficult by adding more sophisticated guardrails and content filters, while attackers have responded with progressively more complex and devious jailbreaks.

One of the more successful jailbreak types has seen the evolution of multi turn jailbreaks involving conversational rather than single entry prompts. A new one, dubbed Echo Chamber, has emerged today. It was discovered by NeuralTrust, a firm founded in Barcelona, Spain, in 2024, and focused on protecting its clients ...

Copyright of this story solely belongs to securityweek . To see the full text click HERE

Share:

More related news