DeepSeek has signalled its next development priorities in a new technical study, with founder and chief executive Liang Wenfeng among 15 co-authors, that delves on “native sparse attention” (NSA) – a system that is touted to make artificial intelligence (AI) models more efficient when processing vast amounts of data.
Advertisement
The study, titled “Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention”, was published by DeepSeek on Sunday via arXiv – an online forum for professional members of the scientific community – just a day before Liang, 40, took part in a symposium with tech entrepreneurs hosted by Chinese President Xi Jinping in Beijing.
DeepSeek has sharpened its focus on research, as worldwide attention on the Hangzhou-based start-up has increased, and is in no rush to conduct any fundraising and new commercial activities, according to a person with knowledge of the matter who declined to be identified.
The study shows how Liang and DeepSeek’s young team of scientists are continuing to push the envelope in their industry, following the start-up’s breakthrough development of advanced open-source AI models, V3 and R1, at a fraction of the cost and computing power that major tech companies typically require for large language model (LLM) projects.
“With optimised design for modern [computing] hardware, NSA speeds up inference while reducing pre-training costs – without compromising performance,” the study said.
Advertisement
Inference refers to a situation when an AI model, which has been trained to see patterns in curated data sets, starts to recognise those patterns in data it has never seen before. As a result, the AI model can reason and make predictions that mimic a human’s abilities.
