AI researchers map models to banish 'demon' persona

4 hours ago theregister.co.uk

Researchers from Anthropic and other orgs have observed situations in which LLMs act like a helpful personal assistant, and are trying to study the phenomenon further to make sure chatbots don't go off the rails and cause harm.

Despite the ongoing bafflement about how xAI's Grok was ever allowed to generate sexualized photos of adults and children without their consent, not everyone has given up on moderating LLM behavior.

In a pre-print paper titled "The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models," authors Christina Lu (Anthropic, Oxford), Jack Gallagher (Anthropic), Jonathan Michala (ML Alignment and Theory Scholars or MATS), Kyle Fish (Anthropic), and Jack Lindsey (Anthropic) explain how they mapped the neural networks of several open weight models and identified a set of responses that they call the Assistant persona.

In a blog post, the researchers state, "When you talk to a large language ...

Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE

Share: