Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.
翻译:代码大语言模型(Code LLMs)在代码相关任务中展现出卓越性能。已有多种指令调优方法被提出以提升预训练代码大语言模型的代码生成能力。本文提出一种具备自我评估能力的多样指令模型(DolphCoder),该模型学习多样化的指令目标,并结合代码评估目标以增强其代码生成能力。我们的模型在HumanEval和MBPP基准测试中取得了更优性能,为未来代码指令调优工作提供了新见解。主要研究发现包括:(1)通过不同推理路径增强更多样化的响应,可提升大语言模型的代码能力;(2)提升代码解决方案正确性评估能力,亦能同步增强代码生成能力。