The fine-tuning of Large Language Models (LLMs) specialized in code generation has seen notable advancements through the use of open-domain coding queries. Despite the successes, existing methodologies like \textit{Evol-Instruct} encounter performance limitations, impeding further enhancements in code generation tasks. This paper examines the constraints of existing prompt evolution techniques and introduces a novel approach, \textit{Instruction Fusion} (IF). IF innovatively combines two distinct prompts through a hybridization process, thereby enhancing the evolution of training prompts for code LLMs. Our experimental results reveal that the proposed novel method effectively addresses the shortcomings of prior methods, significantly improving the performance of Code LLMs across five code generation benchmarks, namely HumanEval, HumanEval+, MBPP, MBPP+ and MultiPL-E, which underscore the effectiveness of \textit{Instruction Fusion} in advancing the capabilities of LLMs in code generation.
翻译:大型语言模型(LLMs)在代码生成领域的微调通过使用开放领域编码查询取得了显著进展。尽管取得了成功,但现有方法(如Evol-Instruct)仍面临性能限制,阻碍了代码生成任务的进一步提升。本文分析了现有提示进化技术的局限性,并提出了一种新方法——指令融合(IF)。IF通过混合过程创新性地融合两个不同的提示,从而增强代码LLM训练提示的进化。我们的实验结果表明,所提出的新方法有效解决了先前方法的缺陷,在五个代码生成基准(即HumanEval、HumanEval+、MBPP、MBPP+和MultiPL-E)上显著提升了代码LLM的性能,这凸显了指令融合在推进LLM代码生成能力方面的有效性。