How Huawei’s AI chips outperform Nvidia’s in running DeepSeek’s R1 model

Huawei Technologies’ advanced data centre architecture, CloudMatrix 384, has enabled the company’s Ascend chips to surpass the performance of Nvidia’s H800 graphics processing units (GPUs) in running DeepSeek’s R1 artificial intelligence (AI) model, according to a technical paper.

Jointly written by researchers from Huawei and Chinese AI infrastructure start-up SiliconFlow, the paper described CloudMatrix 384 as a specialised “AI supernode” that is purpose-built for handling extensive AI workloads.

Huawei expected CloudMatrix “to reshape the foundation of AI infrastructure”, according to the paper released this week. It consists of 384 Ascend 910C neural processing units (NPUs) and 192 Kunpeng server central processing units, which are interconnected through a unified bus providing ultra-high bandwidth and low latency.

The advanced large language model (LLM) serving solution, dubbed CloudMatrix-Infer, leverages that infrastructure, according to the paper. It surpassed the performance of some of the world’s most prominent systems in running DeepSeek’s 671-billion-parameter R1 reasoning model.

The architecture reflects US-sanctioned Huawei’s efforts to overcome Washington’s tech control measures, as the company pushes the boundaries of AI system performance.

Data centres are facilities that house large-capacity servers and data-storage systems, with multiple power sources and high-bandwidth internet connections. More enterprises are using data centres to host or manage computing infrastructure for their AI projects.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31