Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM
翻译:代码大语言模型(如StarCoder)在代码相关任务中展现出卓越性能。然而,现有模型大多仅基于海量原始代码数据进行预训练,而缺乏指令微调。本文提出WizardCoder,通过将Evol-Instruct方法适配至代码领域,实现了对代码大语言模型的复杂指令微调。通过在HumanEval、HumanEval+、MBPP和DS-1000这四个主流代码生成基准上的全面实验,我们揭示了该模型的非凡能力——其以显著优势超越所有其他开源代码大语言模型。此外,我们的模型在HumanEval和HumanEval+上甚至优于最大的闭源语言模型Anthropic的Claude与Google的Bard。我们的代码、模型权重及数据已开源至https://github.com/nlpxucan/WizardLM