Ant Group explores AI framework that is 10 times faster than Nvidia’s solution

Chinese fintech giant Ant Group has open-sourced an inference framework for a new type of artificial intelligence (AI) model that it said could make AI systems more efficient, rivalling and even surpassing the framework Nvidia currently uses.

The Alibaba Group Holding affiliate said Monday that its framework, dInfer, was designed for diffusion language models – a newer class of models that generate outputs in parallel, unlike “autoregressive” systems used in large language models (LLMs) such as ChatGPT, which produce text sequentially from left to right. Alibaba owns the Post.

Diffusion models are already widely used in image and video generation.

Ant claimed dInfer was up to three times faster than vLLM, an open-source inference engine developed by researchers at the University of California, Berkeley, and 10 times faster than US chipmaking giant Nvidia’s own framework Fast-dLLM.

Autoregressive language models, from OpenAI’s GPT-3.5 to DeepSeek’s R1, have so far powered the chatbot boom due to their strength in understanding and generating human language. But researchers, especially in academic circles, have continued exploring diffusion language models in search of greater capabilities.

Ant’s push into alternative model paradigms underscores how China’s Big Tech firms are stepping up efforts in algorithm and software optimisation to offset the country’s disadvantages in advanced AI chips.

According to internal tests on Ant’s own diffusion model LLaDA-MoE, dInfer generated an average of 1,011 tokens per second on a code-generation benchmark called HumanEval, compared with 91 tokens per second for Nvidia’s Fast-dLLM and 294 for Alibaba’s Qwen-2.5-3B model optimised with vLLM.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30