Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks are however generally overlooking the need to reason about the logic behind code changes, beyond syntactic patterns in the data. High-performing fine-tuning experiments also usually come at very high computational costs. With MORepair, we propose a novel perspective on the learning focus of LLM fine-tuning for program repair: we not only adapt the LLM parameters to the syntactic nuances of the task of code transformation (objective 1), but we also specifically fine-tune the LLM with respect to the logical reason behind the code change in the training data (objective 2). Such a multi-objective fine-tuning will instruct LLMs to generate high-quality patches. We apply MORepair to fine-tune four open-source LLMs with different sizes and architectures. Experimental results on C++ and Java repair benchmarks show that the implemented fine-tuning effectively boosts LLM repair performance by 7.6% to 10% in Top-10 repair suggestions. We further show that our fine-tuning strategy yields superior performance compared to the incumbent state-of-the-art in fine-tuned models for program repair, Fine-tune-CoT and RepairLLaMA.
翻译:大语言模型(LLMs)在广泛的下游任务中展现出卓越能力。在软件工程领域,代码修复等专用任务因需要捕捉代码变更背后的逻辑推理(而非仅依赖数据中的句法模式)而面临独特挑战,通常需通过微调实现最优性能。现有文献中针对程序修复任务提出的LLM微调方法普遍忽视了代码变更逻辑推理需求,同时高性能微调实验往往伴随极高的计算成本。本文提出的MORepair框架为LLM程序修复微调提供了新视角:不仅调整LLM参数以适配代码转换任务的句法特性(目标1),更针对训练数据中代码变更的逻辑原因进行专项微调(目标2)。这种多目标微调策略将引导LLM生成高质量补丁。我们应用MORepair对四种不同规模和架构的开源LLM进行微调。在C++和Java修复基准上的实验结果表明,该微调方案有效提升LLM修复性能,Top-10修复建议准确率提升7.6%至10%。进一步验证显示,我们的微调策略优于当前最先进的程序修复微调模型Fine-tune-CoT和RepairLLaMA。