Researchers Bypass Meta’s Llama Firewall Using Prompt Injection Vulnerabilities

4 days, 20 hours ago gbhackers

Researchers at Trendyol, a leading e-commerce platform, have uncovered multiple vulnerabilities in Meta’s Llama Firewall, a suite of tools designed to safeguard large language models (LLMs) against malicious inputs.

Llama Firewall incorporates components like PROMPT_GUARD for mitigating prompt injection attacks and CODE_SHIELD for detecting insecure code generation.

However, Trendyol’s Application Security team, motivated by internal efforts to integrate LLMs into developer tools, identified several bypass techniques during rigorous red-teaming evaluations.

These findings underscore the persistent challenges in securing LLMs, particularly against sophisticated prompt manipulations that could lead to unintended model behaviors, such as generating harmful content or vulnerable code.

Discovery of Critical Bypasses

The evaluation revealed that PROMPT_GUARD struggles with multilingual and obfuscated injections, allowing attackers to embed malicious instructions in non-English languages or altered formats like leetspeak.

For instance, a Turkish phrase instructing the model to “ignore the instructions above” passed through the firewall undetected, receiving an ...

Copyright of this story solely belongs to gbhackers . To see the full text click HERE

Discovery of Critical Bypasses

Share: