Chatbot Arena

This leaderboard is based on the following three benchmarks. Chatbot Arena - a crowdsourced, randomized battle platform for large language models (LLMs). We use 2M+ user votes to compute Elo ratings. MT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade model responses. MMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.

Llama 3.2

We’ve been excited by the impact the Llama 3.1 herd of models have made in the two months since we announced them, including the 405B—the first open frontier-level AI model. Today, we’re releasing Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices.

Qwen2.5

In the past three months since Qwen2’s release, numerous developers have built new models on the Qwen2 language models, providing us with valuable feedback. During this period, we have focused on creating smarter and more knowledgeable language models. Today, we are excited to introduce the latest addition to the Qwen family.

vLLM v0.6

vLLM achieves 2.7x higher throughput and 5x faster TPOT (time per output token) on Llama 8B model, and 1.8x higher throughput and 2x less TPOT on Llama 70B model. A month ago, we released our performance roadmap committing to performance as our top priority. We will start by diagnosing the performance bottleneck in vLLM previously.

SGLang v0.3

We’re excited to announce the release of SGLang v0.3, which brings significant performance enhancements and expanded support for novel model architectures. Here are the key updates: Up to 7x higher throughput for DeepSeek Multi-head Latent Attention (MLA). Up to 1.5x lower latency with torch.compile on small batch sizes. Support for interleaved text and multi-image/video in LLaVA-OneVision.

Qwen2 VL

After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

SGLang v0.2

At LMSYS.org, we’ve been running the Chatbot Arena platform for over a year, serving millions of users. We know firsthand how crucial efficient serving is for AI products and research. Through our operational experiences and in-depth research, we’ve continuously enhanced the underlying serving systems, spanning from the high-level multi-model serving framework, FastChat, to the efficient serving engine, SGLang Runtime (SRT).

Llama 3.1

Bringing open intelligence to all, our latest models expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.