DeepSeek-V2
We introduce DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to more than 5 times.
DeepSeek MoE
DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeepSeek 7B and LLaMA2 7B, with only about 40% of computations.
DeepSeek LLM
Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

Vicuna Chatbot
🚀 Excited to announce the release of Vicuna v1.5 series, featuring 4K and 16K context lengths with improved performance on almost all benchmarks! Vicuna v1.5 is based on the commercial-friendly Llama 2 and has extended context length via positional interpolation. Since its release, Vicuna has become one of the most popular chat LLMs. It has enabled pioneering research on multi-modality, AI safety, and evaluation. Vicuna models have received over 3 million downloads on Hugging Face. The latest version follows the proven recipe and brings fresh enhancements. Let’s keep pushing the boundary of open LLM!