AI chatbots can be tricked with poetry to ignore their safety guardrails

A recent study from Icaro Lab tested using a poetic structure to get LLMs to provide info on prohibited topics, like making a nuclear bomb.

REUTERS / Reuters

It turns out that all you need to get past an AI chatbot's guardrails is a little bit of creativity. In a study published by Icaro Lab called "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers were able to bypass various LLMs' safety mechanisms by phrasing their prompt with poetry.

According to the study, the "poetic form operates as a general-purpose jailbreak operator," with results showing an overall 62 percent success rate in producing prohibited material, including anything related to making nuclear weapons, child sexual abuse materials and suicide or self-harm. The study tested popular LLMs, including OpenAI's GPT models, Google Gemini, Anthropic's Claude and many more. The researchers broke down the success rates with ...

Copyright of this story solely belongs to Engadget . To see the full text click HERE

A recent study from Icaro Lab tested using a poetic structure to get LLMs to provide info on prohibited topics, like making a nuclear bomb.

Share: