After DeepSeek-V4: The East-West Divide in Open-Source AI Models

How DeepSeek’s cost war is forcing NVIDIA to rethink the open-weight AI ecosystem

May 09, 2026

This article originally appeared on Weijin’s WeChat Official Account on April 28, 2026 as a follow-up piece to DeepSeek-V4: A Need for Reassessment. Original Chinese title: 「DeepSeek-V4之后，开源模型的东西方阵营」. It has been translated and adapted for an English-speaking audience.

DeepSeek, once a favorite of Jensen Huang, is set to become a further competitor to his emerging ecosystem of open-weight models outside of China following the release of its V4 version.

Before the large-scale deployment of Huawei Ascend 950 clusters in the second half of the year, DeepSeek-V4 has already begun a dramatic price push, with reported input costs for cache hits falling to extremely low levels. Claims such as “USD per 1 billion tokens” should be treated as directional rather than literal, but they underscore a real trend toward aggressive cost compression. Perhaps Huang’s plan to establish a new order for open-weight models should be accelerated.

Last month, NVIDIA expanded its Nemotron series, with reported variants around 100B and 400B–500B parameters. References to “Nemotron 3 Super” (120B) and “Nemotron 3 Ultra” (480B) are broadly consistent with industry chatter, although exact naming, specifications, and open release status have not all been formally standardized or confirmed. If fully released, the larger variant would rank among the biggest open-weight models developed in the United States, though again not strictly “open-source” in the permissive sense.

Huang’s shrewdness, DeepSeek’s deep calculations

In addition, media reports have suggested that NVIDIA is considering a plan to invest up to $26 billion across models, infrastructure, talent, and ecosystem development over several years. This figure has not been formally confirmed in detail, but it aligns directionally with the company’s scale and recent moves. The later debut of a consortium-style ecosystem around GTC 2026 is consistent with that trajectory, even if the linkage is not explicitly verified.

The US is not lacking in open-weight model labs. On the contrary, there are many, and at least a dozen to twenty groups are actively releasing high-quality models, broadly comparable in technical capability to Chinese peers. However, their models often have smaller total parameter counts and, more importantly, stricter licensing terms, which can limit downstream adoption and community influence.

For example, there is Allen Institute for AI’s OLMo, Hugging Face’s SmolLM, IBM’s Granite, and work from Stanford University, alongside models such as Microsoft’s Phi and Gemma. Some references, such as “OpenAI GPT OSS,” should be treated cautiously, as OpenAI has not released a fully open-source flagship model in the traditional sense. Similarly, Anthropic has remained closed, while xAI has signaled openness but has not yet delivered a major fully open release. Meta’s Llama models are open-weight but come with licensing restrictions, rather than being fully open-source.

The Chinese open-weight model camp that has most visibly impacted Silicon Valley and Wall Street includes flagship models from DeepSeek, Alibaba (Qwen), Moonshot AI (Kimi), MiniMax, and Zhipu AI (GLM). These are among the main drivers behind the narrative of “token cost going global.”

In this sense, the US open-weight ecosystem has been re-energizing since late last year. NVIDIA and companies like Arcee are active participants, while startups such as Reflection AI have raised significant funding and positioned themselves, at least rhetorically, as American counterparts to DeepSeek, although exact funding figures and comparisons should be treated with caution.

At the beginning of the year, Arcee emphasized its “Made in America” positioning and released its Trinity-Large-Base model, reportedly around the 400B parameter scale. Details such as training on thousands of NVIDIA chips are plausible but not publicly verified. In terms of parameter size, it sits in the same broad class as Llama 405B-scale models and other large open-weight systems. The anticipated Nemotron large variants would also fall into this range.

However, with delays around Meta’s ultra-large “Behemoth”-style models, Chinese labs appear to be moving faster toward trillion-parameter-scale architectures. This includes reported models such as Ant Group’s Ring-1T, Kimi-K2.6 (around 1T), and DeepSeek-V4-Pro (reported up to ~1.6T total parameters), as well as Zhipu AI’s GLM series approaching the high hundreds of billions. Rumored future models such as Kimi-K3 at multi-trillion scale remain speculative. As noted, total parameter count, especially in Mixture-of-Experts systems, does not directly translate to performance, but it does indicate scaling direction.

Open models and Huang’s perspectivve

Huang’s broader framework, sometimes described as a layered or “stack” view of AI, highlights that widely available models drive demand across the entire computing ecosystem. While exact phrasing like the “five-layer cake theory” varies by context, the underlying idea is consistent with his public messaging.

With startups like Cursor achieving very high valuations and building on open-weight ecosystems, including some influenced by Chinese models, US concern about maintaining leadership in frontier open models is increasing. DeepSeek’s technical disclosures noting validation across both NVIDIA GPUs and Huawei Ascend NPUs further reinforce the significance of cross-hardware portability.

Chinese open-source models are gradually expanding toward more independent computing infrastructure. The key strategic concern is not just model quality, but the possibility that these models, running on non-NVIDIA systems such as those from Huawei, could scale globally on the basis of cost efficiency. This would challenge both CUDA’s centrality and the dominance of closed-source US models.

As a result, Huang is moving to shape the ecosystem more directly, supporting a network of open-weight models aligned with NVIDIA hardware and software. NVIDIA is not new to model development. Since the release of ChatGPT, it has introduced multiple model families, including early Nemotron variants, Megatron-LM, and related systems, although branding has indeed been somewhat fragmented.

In discussions with researchers such as Nathan Lambert, NVIDIA’s Bryan Catanzaro has said that Nemotron serves both internal research needs, particularly for anticipating hardware evolution, and broader ecosystem development goals. His background, including earlier work alongside figures like Andrew Ng, reflects continuity between earlier deep learning waves and current large-scale model efforts.

He has also noted that early progress was slowed by fragmented, bottom-up experimentation, with later efforts consolidating into a more coordinated strategy. Reports of a large integrated team spanning multiple research areas are credible, though exact headcount comparisons with DeepSeek should be treated as approximate rather than definitive.

According to available disclosures, DeepSeek’s team is smaller but highly focused, with a few hundred contributors across technical and operational roles. NVIDIA, by contrast, likely fields a significantly larger effort across its AI initiatives, though calling it the single most powerful open-source model company in the US is still an interpretive claim rather than a strict fact.

At the same time, NVIDIA’s growing role in model development does raise concerns among its own customers, creating tension between platform provider and competitor. This helps explain Huang’s push toward a more structured ecosystem approach.

In March, reports again pointed to NVIDIA planning tens of billions of dollars in investment across AI infrastructure and ecosystem development. While details remain incomplete, the direction is clear.

At GTC 2026, NVIDIA formalized part of this strategy through a consortium-style initiative, bringing together model developers such as Mistral AI, Reflection AI, Sarvam AI, and Black Forest Labs, alongside tooling players like LangChain and application-layer companies including Perplexity AI. This reflects a coordinated attempt to shape the next phase of the open-weight AI ecosystem.

DeepSeek, once publicly praised by Jensen Huang, is increasingly emerging not as a partner signal but as a structural competitor to NVIDIA’s vision for a global open-source AI ecosystem. Ahead of large-scale deployment of Huawei Ascend 950 clusters later this year, DeepSeek-V4 has already triggered another round of aggressive pricing. Reports of ultra-low marginal costs for cached inference, while sometimes overstated in phrasing, point to a real trend: Chinese labs are compressing token costs faster than their Western counterparts, particularly at scale. Pricing, not just capability, is becoming the decisive battleground for model adoption.

In parallel, NVIDIA has expanded its own model efforts. The latest iterations of the Nemotron series, including variants reportedly in the roughly 100 billion to 500 billion parameter range, are positioned less as standalone competitors and more as reference architectures for an ecosystem built around CUDA. It is important to clarify that while these models are often described as open-source, most US models, including Nemotron, Llama, and Gemma, are more accurately open-weight with restrictions rather than fully permissive open-source in the traditional sense. This distinction helps explain why their community traction can lag behind leading Chinese releases.

The United States does not lack capable model labs. Organizations such as the Allen Institute for AI, Hugging Face, IBM, and Stanford University continue to produce credible models. However, two structural constraints remain consistent. Licensing tends to be more restrictive, and cost-performance positioning is often less aggressive. By contrast, Chinese players including Alibaba through its Qwen series, Zhipu AI with GLM, and Moonshot AI with Kimi have focused more directly on developer adoption through pricing and accessibility, not just benchmark performance.

The discussion around trillion-parameter models reflects a real directional shift but requires some calibration. Many of the largest Chinese models, including DeepSeek-V4, are believed to rely on Mixture-of-Experts architectures, where total parameter count is not equivalent to the number of parameters active per token. As a result, comparisons with dense models such as Llama or Nemotron are imperfect. Parameter scale still signals ambition and direction, but it is no longer a clean proxy for performance.

The more consequential development is infrastructure portability. Chinese open-weight models are increasingly being validated across both NVIDIA GPUs and Huawei NPUs. Huawei’s Ascend stack represents the first credible alternative training and inference ecosystem at scale. Even partial portability weakens CUDA lock-in, which has long underpinned NVIDIA’s dominance.

Rather than simply competing head-on with DeepSeek, NVIDIA is moving to shape the broader system. It is building reference models such as Nemotron, aligning with model developers globally, and anchoring these efforts to CUDA and NVIDIA hardware. The consortium announced around GTC 2026, involving players such as Mistral AI and Perplexity AI, reflects this approach. It functions less as a traditional alliance and more as a coordination layer for ecosystem control. Reports of a $26 billion investment plan spanning models, infrastructure, and ecosystem development are directionally credible given NVIDIA’s scale, but should still be treated as reported rather than fully confirmed in detail.

This is no longer just a model race. China is pushing on cost, scale, and hardware independence, while the US is regrouping around ecosystem coherence and platform control. DeepSeek is not simply another strong model lab. It shows China has access to AI systems that are cheaper, portable across hardware stacks, and globally distributable without reliance on Western infrastructure. That is the development NVIDIA is ultimately responding to.

Weijin Research

Discussion about this post

Ready for more?