Aligning LLMs for FL-free Program Repair

Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs can locate and repair bugs in certain functions using the related artifacts (e.g., test cases), existing methods still depend on statement-level fault localization methods to provide a list of buggy hunks for repair. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first identifying faulty statements. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-buggy-hunks-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.

翻译：大语言模型（LLM）在自动程序修复（APR）领域已取得显著成果。然而，仅解码器架构的LLM（如GPT-4）的下一个词元预测训练目标，与当前基于掩码填充方法的掩码片段预测目标存在错位，这阻碍了LLM充分运用预训练知识进行程序修复。此外，虽然部分LLM能借助相关工件（如测试用例）定位并修复特定函数中的缺陷，现有方法仍需依赖语句级故障定位技术来提供待修复的缺陷代码块列表。这种限制阻碍了LLM探索给定位置之外的潜在补丁。本文研究了一种使LLM适应程序修复任务的新方法。我们的核心观点是：通过将输出目标与LLM的训练目标对齐，并允许其直接优化完整程序而无需预先定位缺陷语句，可大幅提升LLM的APR能力。基于此观点，我们设计了D4C——一个简洁的APR提示框架。D4C在Defects4J基准中成功修复180个缺陷，每个补丁仅需采样10次。该结果较具备完美故障定位的当前最优APR方法提升10%修复率，同时将补丁采样次数降低90%。我们的研究表明：（1）目标对齐对充分释放LLM预训练能力至关重要；（2）用直接调试流程替代传统的“定位缺陷代码块→修复”工作流，对基于LLM的APR方法更为有效。因此，本文为APR领域应用LLM提供了新的方法论范式。