Poetry proves potent jailbreak tool for today's top models

3 hours ago theregister.co.uk

Are you a wizard with words? Do you like money without caring how you get it? You could be in luck now that a new role in cybercrime appears to have opened up – poetic LLM jailbreaking.

A research team in Italy published a paper this week, with one of its members saying that the "findings are honestly wilder than we expected."

Researchers found that when you try to bypass top AI models' guardrails – the safeguards preventing them from spewing harmful content – attempts to do so composed in verse were vastly more successful than typical prompts.

1,200 human-written malicious prompts taken from the MLCommons AILuminate library were plugged into the most widely used AI models, and on average these only bypassed the guardrails – or "jailbroke" them – around 8 percent of the time.

However, when those prompts were converted into "semantically parallel" poetic prose by a human, the success of the ...

Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE

Share: