Microsoft focuses on hidden risks inside large language models

(Image credit: Microsoft)

Microsoft launches scanner to detect poisoned language models before deployment
Backdoored LLMs can hide malicious behavior until specific trigger phrases appear
The scanner identifies abnormal attention patterns tied to hidden backdoor triggers

Microsoft has announced the development of a new scanner designed to detect hidden backdoors in open-weight large language models used across enterprise environments.

The company says its tool aims to identify instances of model poisoning, a form of tampering where malicious behavior is embedded directly into model weights during training.

These backdoors can remain dormant, allowing affected LLMs to behave normally until narrowly defined trigger conditions activate unintended responses.

Microsoft researchers crack AI guardrails with a single prompt

DeepSeek took off as an AI superstar a year ago - but could it also be a major security risk? These experts think so

OpenAI admits new models likely to pose 'high' cybersecurity risk

How the scanner detects ...

Copyright of this story solely belongs to techradar.com . To see the full text click HERE

How the scanner detects ...

Share: