OpenAI turns the screws on chatbots to get them to confess mischief

5 hours ago theregister.co.uk

Some say confession is good for the soul, but what if you have no soul? OpenAI recently tested what happens if you ask its bots to "confess" to bypassing their guardrails.

We must note that AI models cannot "confess." They are not alive, despite the sad AI companionship industry. They are not intelligent. All they do is predict tokens from training data and, if given agency, apply that uncertain output to tool interfaces.

Terminology aside, OpenAI sees a need to audit AI models more effectively due to their tendency to generate output that's harmful or undesirable – perhaps part of the reason that companies have been slow to adopt AI, alongside concerns about cost and utility.

"At the moment, we see the most concerning misbehaviors, such as scheming⁠, only in stress-tests and adversarial evaluations," OpenAI explained in a blog post on Thursday.

"But as models become more capable and increasingly ...

Copyright of this story solely belongs to theregister.co.uk . To see the full text click HERE

Share: