DeepSeek innovation speeds up processing of long text, paper says

Chinese AI start-up DeepSeek has unveiled a new technology that could allow next-generation language models to process very long text much faster and cheaper than traditional methods.

Advertisement

By training AI to focus on key information rather than every word, the company’s “native sparse attention” (NSA) method sped up long-text processing by up to 11 times, according to a paper published by CEO Liang Wenfeng and his team.

The NSA method combined algorithm innovations with improved hardware to improve efficiency without sacrificing performance, according to the paper published on Tuesday on arXiv, a platform for preprint papers that have not been peer reviewed.

It could improve AI’s ability to solve complex problems, write large programs and track long conversations, said the team behind R1, the open-source, low-cost model that shook the AI world last month.

“With an optimised design for modern hardware, NSA speeds up inference while reducing pre-training costs – without compromising performance,” DeepSeek posted on X just a day after Elon Musk’s AI company, xAI, released its Grok 3 model.

image

01:18

Trump: Chinese AI start-up DeepSeek’s strong showing a ‘wake-up call’ for US tech sector

Trump: Chinese AI start-up DeepSeek’s strong showing a ‘wake-up call’ for US tech sector

AI models such as ChatGPT use a technique called attention to process text. Just as humans recall earlier words to understand a sentence, AI determines which words are important and how they relate to each other.

Advertisement

  

Read More

Leave a Reply