We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.
翻译:本文介绍了DeepSeek-Coder-V2,一个开源的混合专家代码语言模型,其在代码专项任务中达到了与GPT4-Turbo相媲美的性能。具体而言,DeepSeek-Coder-V2是在DeepSeek-V2的一个中间检查点基础上,使用额外的6万亿token进行进一步预训练得到的。通过这种持续预训练,DeepSeek-Coder-V2显著增强了DeepSeek-V2的代码编写和数学推理能力,同时在通用语言任务上保持了相当的性能。与DeepSeek-Coder-33B相比,DeepSeek-Coder-V2在代码相关任务的各个方面,以及推理和通用能力上都展现出显著进步。此外,DeepSeek-Coder-V2将其支持的编程语言从86种扩展至338种,同时将上下文长度从16K扩展到128K。在标准基准评测中,DeepSeek-Coder-V2在编码和数学基准测试上,相较于GPT4-Turbo、Claude 3 Opus和Gemini 1.5 Pro等闭源模型,取得了更优的性能。