Some new rumors about DeepSeek R2 have been out and are fascinating. These reports suggest some huge leaps are happening behind the scenes, and could play a big role in the AI world.
DeepSeek R2: What the Rumors Say
Apparently, Huawei recently leaked some internal lab data related to how DeepSeek is using their chips. More importantly, they revealed some early information about the upcoming DeepSeek R2.
Here’s what’s being said:
- DeepSeek R2 is rumored to be a 1.2 to 2 trillion parameter model, utilizing a hybrid Mixture of Experts 3.0 architecture.
- Compared to R1, this is almost double the size.
- It reportedly contains about 78 billion active parameters (those used during each forward pass).
- Training is said to be based on 5.2 petabytes of data — an impressive scale by any standard.
- Training occurs on a self-developed distributed framework achieving 82% cluster utilization using Huawei’s Ascend 910B chips.
The Ascend 910 series, developed by Huawei, is critical here. Due to export restrictions, DeepSeek cannot access Nvidia’s latest chips like the A100s or H100s. Instead, they have turned to domestic solutions, and Huawei’s chips fit the bill perfectly.
DeepSeek’s Clever Strategy with Huawei Hardware
DeepSeek has openly mentioned using Huawei’s Ascend chips for inference, and it seems to be paying off:
- Huawei’s chips, especially the newer 910C model (basically two 910Bs merged), are said to perform on par with Nvidia’s H100, but with lower energy consumption and cost.
- Manual optimizations of CUN kernels and their PyTorch-native support make it easier for DeepSeek to transition from CUDA workflows.
- The move to Huawei chips reportedly brings a 98% cost reduction compared to using Nvidia’s GPUs — incredible for a company looking to scale affordably.
It’s not just inference; DeepSeek’s training stack also shows smart engineering:
- They mainly use PyTorch, making their system accessible and familiar to most machine learning engineers.
- They’ve built compatibility layers to make Huawei hardware more CUDA-friendly, ensuring training efficiency.