Using Chinese AI models safely — the risk is in how you run them

Some of the best open-weight models in the world come from Chinese labs — DeepSeek, Qwen, GLM (from Zhipu AI, a Tsinghua spin-out; its GLM‑5.2 ships a 1M-token context built for agentic coding, served internationally under the z.ai brand), Kimi. For a US company that’s a genuine opportunity and a genuine concern at the same time. The concern is real, but it’s widely misunderstood: the data risk is dominated by how you deploy the model, not by who built it. A model you download and run on your own hardware behaves very differently — legally and technically — from a prompt you send to a server in China. This walks the real risk surface, separates it by deployment mode, and gives a concrete plan to diminish each part.

First, the distinction that changes everything

“Is it safe to use a Chinese model?” is the wrong question, because it bundles two very different things: where the model was made and where the inference runs. Country of origin is a property of the maker; data jurisdiction is a property of the host. They are not the same, and conflating them leads teams to either reject excellent open models for no real benefit, or to pipe sensitive data to a Chinese endpoint without realizing it.

An open-weight model is a file — a few hundred gigabytes of numbers. Running it on your own hardware sends nothing to its maker; the weights don’t “call home.” Sending a prompt to a company’s hosted API, by contrast, puts that prompt under the laws and reach of wherever that server sits. So the same DeepSeek model can be one of the safest options or one of the riskiest, depending entirely on which of those you do.

The three deployment modes, by data risk

Self-hosted open weights — lowest data risk

You download the open weights (from Hugging Face or a reputable mirror) and run inference on your own servers, a US/EU cloud GPU, or on-prem hardware. No prompt or output ever leaves your environment — there is no Chinese server in the loop, and PRC data law has no purchase on data that never crosses the border. For a US client handling sensitive, regulated, or proprietary data, this is the mode that makes a Chinese-origin model genuinely usable. The residual risks here are about model behavior and supply chain, not data exfiltration — covered below.

Chinese-made weights, US/neutral host — low data risk

A Chinese-origin open model served by a US or neutral inference provider (Together, Fireworks, DeepInfra, NVIDIA NIM, SambaNova, Cerebras, Groq, or a US-region route on a gateway) runs on that provider’s infrastructure, under that provider’sjurisdiction. Jurisdiction follows the host, not the maker — so the data does not go to China. You inherit that provider’s data terms (read them, and confirm the actual serving region), plus the same model-behavior caveats as self-hosting. A reasonable middle ground when you want managed inference without running GPUs yourself.

Pinging a China-hosted API — highest data risk

Sending prompts to a Chinese company’s own endpoint — DeepSeek’s API, a maker’s first-party service, or worst of all a free consumer chat app — puts your data squarely inside PRC jurisdiction. As the origins note spells out, that means the Data Security Law, the Personal Information Protection Law (PIPL), data-localization rules, and security-review and potential government-access / retention obligations all apply to what you send. Free consumer endpoints are the sharpest edge: their terms of service often grant broad rights to use submitted data for training and other purposes. For regulated, PII, or proprietary data, this is the path to avoid.

What this looks like in practice (mid-2026)

This isn’t hypothetical — US developers and companies are already choosing Chinese models, and the pattern matches the modes above. The pull is cost: one developer reported coding at roughly $10/hour on Claude versus under $0.50/hour on DeepSeek; the SF startup Lindy said switching from Anthropic to DeepSeek “saved the firm millions.” Adoption is visibly climbing — on Vercel’s platform, DeepSeek’s share of tokens processed jumped from under 1% to about 17% in May 2026 (though its revenue share stayed near 1% — these are cheap tokens), and on OpenRouter, DeepSeek, Tencent, Minimax, and Xiaomi ranked among the most-used models.

And the deployment-mode distinction is exactly what the real disputes turn on. When Airbnb and Anysphere (maker of Cursor) disclosed using Qwen and Kimi, US lawmakers opened investigations — and Airbnb’s CEO clarified the company was not sending any data to the model developers. That is the whole point of this page in one news cycle: the question regulators and customers actually care about is where the inference runs and where the data goes, not the flag on the lab. Tellingly, businesses that adopt these models commonly do so through American cloud providers, to process the data domestically— the “Chinese weights on a US/neutral host” mode above — while large regulated firms still hold back over data-security, censorship, and geopolitical risk.

Adoption figures and examples: Rest of World, “When Americans choose Chinese AI” (2026); Vercel and OpenRouter usage as reported therein.

The risks that survive even when no data leaves

Self-hosting closes the data-exfiltration question, but two classes of risk remain and deserve honest attention — they apply to any third-party weights, and are simply more scrutinized for models from a strategic rival.

Model behavior. Models reflect their training. Chinese models can carry embedded political bias and will refuse or deflect on certain topics (Tiananmen, Taiwan, Xinjiang, criticism of the Party) — usually irrelevant to a B2B workflow, but disqualifying for some use cases, and worth testing on your tasks rather than assuming. More subtly, the research community studies data-poisoning and backdoors — weights trained so a specific trigger phrase elicits attacker-chosen behavior. This is demonstrated in academic settings, not shown to be common in major open releases, but it is the reason high-assurance deployments evaluate behavior rather than trust reputation.

Supply chain. The bigger practical risk is the download itself. A model repo can ship malicious code: the legacy pickle format can execute arbitrary code on load, which is exactly why safetensors exists — it stores only tensors, with no code path. Pull weights only from reputable sources, verify published checksums, prefer safetensors over pickle, and scan model repos before loading. This is generic open-source hygiene, not a China-specific step — but it’s the one most teams skip.

How to diminish the risk — the checklist

None of this is exotic. A US client can use a top Chinese open model safely by making a few deliberate deployment choices:

Self-host for anything sensitive. Run open weights on your own hardware or a US/EU cloud. The self-hosting lever often pays for itself at volume anyway — the security win is a bonus.
If you use an API, pick a US/neutral host— never the maker’s China endpoint — and confirm the serving region and data terms. The same model on Together/Fireworks/DeepInfra/NVIDIA NIM keeps the data out of China.
Verify the weights. Reputable source, checksum match, prefer safetensors, scan for unsafe serialization before loading.
Isolate the network for self-hosted inference.Run egress-restricted or air-gapped so even a misbehaving model can’t reach the outside.
Govern the data. Classify it; keep regulated, PII, and proprietary data off any cross-border path; add DLP and egress controls on the chat surface itself.
Test behavior on your tasks.Don’t assume — measure. Benchmark refusals, bias, and accuracy on your real workload, which is the entire point of measuring models rather than trusting a brand.
Align with your compliance posture. Map the choice to HIPAA / FedRAMP / CMMC / export-control obligations, and watch the policy backdrop — the governance section tracks US actions restricting model access, which can change what’s permitted.

The bottom line, as a decision

Chinese open models are often excellent and openly licensed; rejecting them outright leaves real capability and cost savings on the table. The risk is real but manageable, and it is decided almost entirely by deployment mode:

Deployment mode                         Data risk   Use it when…
Self-hosted open weights (your HW/US)    Lowest      Sensitive / regulated / proprietary data
Chinese weights on a US/neutral host     Low         Managed inference, non-secret-grade data
Chinese-hosted first-party API           High        Public / non-sensitive data only
Free Chinese consumer chat app           Highest     Avoid for any business data

The one-line version: an open Chinese model on your own hardware is a file doing math; the same model called over a Chinese API is your data leaving the building. Choose the mode that matches your data, verify the weights, isolate the network, and measure the behavior — and a frontier Chinese model becomes a defensible, cost-effective tool rather than a liability.