Blog

Chatbot Arena

This leaderboard is based on the following benchmarks. Chatbot Arena - a crowdsourced, randomized battle platform for large language models (LLMs). We use 3.2M+ user votes to compute Elo ratings. MMLU - a test to measure a model’s multitask accuracy on 57 tasks. Arena-Hard-Auto - an automatic evaluation tool for instruction-tuned LLMs.

Open R1

A fully open reproduction of DeepSeek-R1. The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it.

Large-Scale Expert Parallelism

DeepSeek is a popular open-source large language model (LLM) praised for its strong performance. However, its large size and unique architecture, which uses Multi-head Latent Attention (MLA) and Mixture of Experts (MoE), require an advanced system for efficient serving at scale. In this blog, we explain how we match DeepSeek’s inference system performance using prefill-decode disaggregation and large-scale expert parallelism (EP) with SGLang.

Qwen2.5-Omni

We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis.

QwQ

QwQ is the reasoning-specialized model within the Qwen series. Unlike traditional instruction-tuned models, QwQ leverages advanced reasoning and critical thinking abilities to achieve superior performance on downstream tasks, especially those involving complex problem-solving. Our latest release, QwQ-32B, is a mid-sized model that competes effectively with top-tier reasoning models like DeepSeek-R1 and o1-mini, delivering robust and competitive results.

Qwen2.5-VL

In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL. Key Enhancements: Powerful Document Parsing Capabilities: Upgrade text recognition to omnidocument parsing, excelling in processing multi-scene, multilingual, and various built-in (handwriting, tables, charts, chemical formulas, and music sheets) documents.

vLLM V1

We are thrilled to announce the alpha release of vLLM V1, a major upgrade to vLLM’s core architecture. Based on lessons we learned over the past 1.5 years of vLLM development, we revisited key design decisions, consolidated various features, and simplified the codebase to enhance flexibility and scalability. V1 already achieves state-of-the-art performance and is set to gain even more optimizations.

DeepSeek-R1

We introduce DeepSeek’s first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. DeepSeek-R1 incorporates cold-start data before RL, and achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

⌂ • Engineering • Research • Leaderboard • Learning