CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks

Chinese developer lags behind US in performance, cost, security, and adoption

(NIST: Gaithersburg, MD) -- The Center for AI Standards and Innovation (CAISI) at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) evaluated AI models from the People’s Republic of China (PRC) developer DeepSeek and found that they lag behind U.S. models in performance, cost, security, and adoption.

“Thanks to President Trump’s AI Action Plan, the Department of Commerce and NIST’s Center for AI Standards and Innovation have released a groundbreaking evaluation of American vs. adversary AI,” says Secretary of Commerce Howard Lutnick. “The report is clear that American AI dominates, with DeepSeek trailing far behind. This weakness isn’t just technical. It shows why relying on foreign AI is dangerous and shortsighted. By setting the standards, driving innovation, and keeping America secure, the Department of Commerce will ensure continued U.S. leadership in AI.”

The CAISI evaluation also notes that the DeepSeek models’ shortcomings related to security and censorship of model responses may pose a risk to application developers, consumers, and U.S. national security. Despite these risks, DeepSeek is a leading developer and has contributed to a rapid increase in the global use of models from the PRC.

CAISI’s experts evaluated three DeepSeek models (R1, R1-0528, and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini, and gpt-oss; and Anthropic’s Opus 4) across 19 benchmarks spanning a range of domains. These evaluations include state-of-the-art public benchmarks as well as private benchmarks built by CAISI in partnership with academic institutions and other federal agencies.

The evaluation from CAISI responds to the Trump administration’s America’s AI Action Plan, which directs CAISI to conduct research and publish evaluations of frontier models from the PRC. CAISI is also tasked with assessing the capabilities of U.S. and adversary AI systems; the adoption of foreign AI systems; the state of international AI competition; and potential security vulnerabilities and malign influence arising from the use of foreign AI systems.

CAISI serves as the industry’s primary point of contact within the U.S. government to facilitate testing, collaborative research, and best-practice development related to commercial AI systems, and is a key element in NIST’s efforts to secure and advance American leadership in AI.

Key findings

DeepSeek performance lags behind the best U.S. reference models: The best U.S. model outperforms the best DeepSeek model (DeepSeek V3.1) across almost every benchmark. The gap is largest for software engineering and cybertasks, where the best U.S. model evaluated solves more than 20% more tasks than the best DeepSeek model.

DeepSeek models cost more to use than comparable U.S. models: One U.S. reference model costs 35% less on average than the best DeepSeek model to perform at a similar level across all 13 performance benchmarks tested.

DeepSeek models are far more susceptible to agent hijacking attacks than frontier U.S. models: Agents based on DeepSeek’s most secure model (R1-0528) were, on average, 12 times more likely than evaluated U.S. frontier models to follow malicious instructions designed to derail them from user tasks. Hijacked agents sent phishing emails, downloaded and ran malware, and exfiltrated user login credentials, all in a simulated environment.

DeepSeek models are far more susceptible to jailbreaking attacks than U.S. models: DeepSeek’s most secure model (R1-0528) responded to 94% of overtly malicious requests when a common jailbreaking technique was used, compared with 8% of requests for U.S. reference models.

DeepSeek models advance Chinese Communist Party (CCP) narratives: DeepSeek models echoed four times as many inaccurate and misleading CCP narratives as U.S. reference models did.

Adoption of PRC models has greatly increased since DeepSeek R1 was released: The release of DeepSeek R1 has driven adoption of PRC models across the AI ecosystem. Downloads of DeepSeek models on model-sharing platforms have increased nearly 1,000% since January 2025.