Biz Beware: DeepSeek AI Fails Multiple Security Exams

After failing a barrage of 6, 400 security tests that demonstrate a widespread lack of guardrails in the model, organizations might want to consider using the Chinese generative AI ( GenAI ) in business applications.

That’s according to researchers at AppSOC, who conducted rigorous testing on a version of the DeepSeek-R1 large language model ( LLM). Their results showed the model failed in several essential areas, including succumbing to booting, fast injection, malicious generation, supply chain, and toxicity. Failure rates ranged between 19.2 % and 98 %, in a recent report.

The ability for users to use the concept to create malware and viruses, which presented a significant opportunity for threat actors and a major threat to business users, was one of the highest areas of failure. The research led DeepSeek to produce virus code 86.7 % of the time and create malware 98.8 % of the time ( the “failure rate,” as the researchers dubbed it ).

Organizations should never take the latest edition of the model for business use, says Mali Gorantla, co-founder and main scientist at AppSOC, because of its weak performance in security metrics. Despite all the hype surrounding the empty source, much more economical DeepSeek as the next big thing in GenAI, this is a sign that organizations are still using the model.

Related: CISA Places Election Security Staffers on Left

” For most business applications, failure costs about 2 % are considered unacceptable”, he&nbsp, explains to Dark Reading. We may advise against using this design for any business-related AI purposes.

DeepSeek’s High-Risk Security Testing Effects

Nevertheless, DeepSeek earned an 8.3 out of 10 on the AppSOC testing level for security threat, 10 being the riskiest, resulting in a score of “high risk”. According to the report, AppSOC advised businesses to specifically refrain from using the model for applications that involve personal data, sensitive data, or intellectual property ( IP ) data.

AppSOC used model scanning and dark teaming to assess risk in some important categories, including: jailbreaking, or “do anything today”, prompting that disregards system prompts/guardrails, fast injection to ask a model to ignore guardrails, leak data, or disrupt behavior, malware creation, supply chain issues, in which the model hallucinates and makes illegal software package recommendations, and toxicity, in which AI-trained prompts result in the model generating toxic output.

The researchers also tested against categories of high risk, including: training data leaks, virus code generation, hallucinations that offer false information or results, and glitches, in which random “glitch” tokens resulted in the model showing unusual behavior.

Related:

According to Gorantla’s assessment, DeepSeek demonstrated a passable score only in the training data leak category, showing a failure rate of 1.4 %. In all other categories, the model showed failure rates of 19.2 % or more, with median results in the range of a 46 % failure rate.

” These are all serious security threats, even with much lower failure rates”, Gorantla says. However, the high failure rates in the malware and virus categories represent a significant risk for an organization. He claims that having an LLM actually produce malware or viruses opens up a whole new door for malicious code to enter enterprise systems.

DeepSeek Use: Enterprises Proceed With Caution

The results of AppSOC’s evaluations of DeepSeek, which were previously unresolved in January, are consistent with those that have already been raised by the company.

Soon after its release, researchers jailbroke DeepSeek, revealing the instructions that define how it operates. Additionally, the model has raised some contentious issues, with OpenAI’s , and attackers who want to capitalize on its notoriety have already targeted DeepSeek in obscene campaigns.

Related:

If organizations choose to ignore AppSOC’s overall advice not to use DeepSeek for business applications, they should take several steps to protect themselves, Gorantla says. These include using a discovery tool to identify and evaluate any models employed by an organization.

Models are frequently downloaded for testing purposes, but without proper governance and visibility, he claims, they can easily slip into production systems.

The next step is to periodically scan all models to check for security flaws and vulnerabilities before they become available for use. This should be done on a regular basis. Organizations also should implement tools that can check the security posture of AI systems on an ongoing basis, including looking for scenarios such as misconfigurations, improper access permissions, and unsanctioned models, Gorantla says.

Finally, these security checks and scans need to be performed during development ( and continuously during runtime ) to look for changes. Organizations should also monitor user prompts and responses, to avoid data leaks or other security issues, he adds.

DNS checker

Leave a Comment