Alibaba upgrades Qwen3 model to outperform OpenAI, DeepSeek in maths, coding

Alibaba Group Holding unveiled an upgraded version of its third-generation Qwen3 family of large language models (LLMs), improving one of its members to score higher in maths and coding than products from rivals OpenAI and DeepSeek.

Advertisement

The new Qwen3-235B-A22B-Instruct-2507-FP8 is an open-source model that achieved “significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage”, according to a Tuesday update on artificial intelligence (AI) community HuggingFace and ModelScope, Alibaba’s open-source platform. Alibaba owns the Post.

It outperformed some rivals in certain assessments, such as the 2025 American Invitational Mathematics Examination, where the new Alibaba model scored 70.3. By comparison, DeepSeek-V3-0324, the most recent version of the foundational model that was released in March, scored 46.6 while OpenAI’s GPT-4o-0327 scored 26.7.

As for coding capabilities, the new Qwen secured 87.9 points from the MultiPL-E benchmark, slightly higher than 82.2 and 82.7 from the DeepSeek and OpenAI models above, respectively, though it lagged behind Claude Opus 4 Non-thinking, from Anthropic, which scored 88.5.

Alibaba’s new release was an upgrade from the Qwen3-235B-A22B-FP8. But it only supports non-thinking mode, where an AI system provides a direct output without the explicit reasoning steps or chain of thought that a thinking model might employ. As a result, its content length was boosted eightfold to 256,000 tokens, making it able to handle longer texts in a single conversation.

Advertisement

Also on Tuesday, Alibaba said a Qwen model with 3 billion parameters would be integrated into HP’s smart assistant “Xiaowei Hui” on its personal computers in China, enhancing capabilities including drafting documents and summarising meetings.

  

Read More

Leave a Reply