Qwen3-Coder

We’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct — a 480B-parameter Mixture-of-Experts model with 35B active parameters, offering exceptional performance in both coding and agentic tasks. It sets new state-of-the-art results among open models on Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use.

MTP in SGLang

SGLang is the first and only open-source serving framework to support Multiple Token Prediction (MTP) in combination with Large-Scale Expert Parallelism (EP) and Prefill-Decode disaggregation. This integration delivers up to 60% higher output throughput through a new decoding paradigm, better parallelism, and more efficient resource utilization without sacrificing generation quality.

slime

slime is an LLM post-training framework for RL scaling, providing two core capabilities: High-Performance Training – Supports efficient training in various modes by connecting Megatron with SGLang; Flexible Data Generation – Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

Open R1

A fully open reproduction of DeepSeek-R1. The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it.

Agent Course

AI Agents are autonomous systems that can understand user requests, break them down into steps, and execute actions to accomplish tasks. They combine language models with tools and external functions to interact with their environment. This module covers how to build effective agents using the smolagents library, which provides a lightweight framework for creating capable AI agents.

Large-Scale Expert Parallelism

DeepSeek is a popular open-source large language model (LLM) praised for its strong performance. However, its large size and unique architecture, which uses Multi-head Latent Attention (MLA) and Mixture of Experts (MoE), require an advanced system for efficient serving at scale. In this blog, we explain how we match DeepSeek’s inference system performance using prefill-decode disaggregation and large-scale expert parallelism (EP) with SGLang.

Qwen3

We are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. These models represent our most advanced and intelligent systems to date, improving from our experience in building QwQ and Qwen2.5. We are making the weights of Qwen3 available to the public, including both dense and Mixture-of-Expert (MoE) models.

Qwen2.5-Omni

We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis.