Tech »  Topic »  Poetry proves potent jailbreak tool for today's top models

Poetry proves potent jailbreak tool for today's top models


Are you a wizard with words? Do you like money without caring how you get it? You could be in luck now that a new role in cybercrime appears to have opened up – poetic LLM jailbreaking.

A research team in Italy published a paper this week, with one of its members saying that the "findings are honestly wilder than we expected."

Researchers found that when you try to bypass top AI models' guardrails – the safeguards preventing them from spewing harmful content – attempts to do so composed in verse were vastly more successful than typical prompts.

1,200 human-written malicious prompts taken from the MLCommons AILuminate library were plugged into the most widely used AI models, and on average these only bypassed the guardrails – or "jailbroke" them – around 8 percent of the time.

However, when those prompts were converted into "semantically parallel" poetic prose by a human, the success of the ...


Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE