Gains and Risks for Enterprises With DeepSeek V3.1

2 days, 2 hours ago bankinfosecurity

Splx Says Hardened Prompts Lower Hallucinations But Security Gaps Persist Rashmi Ramesh (rashmiramesh_) • September 23, 2025

Image: Juan Alejandro Bernal/Shutterstock

DeepSeek is touting its newest model as its entry into the "agent era" and performance benchmarks show a notable leap in capabilities. Security testing shows progress and persistent vulnerabilities in the Chinese company's upgraded V3.1 model.

The model's performance benchmarks show a notable leap over prior versions DeepSeek-V3-0324 and DeepSeek-R1-0528. On SWE-bench Verified, a test of software bug-fixing, DeepSeek-V3.1 scored 66, compared with mid-40s for prior models. On SWE-bench Multilingual, which measures bug-fixing across multiple languages, it reached 54.5, nearly doubling earlier results. And on Terminal-Bench, which evaluates command-line reasoning, V3.1 hit 31.3, up from low double-digit scores in previous releases.

Security company Splx ran the model through its AI red-teaming framework ...

Copyright of this story solely belongs to bankinfosecurity . To see the full text click HERE

Share: