During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.
翻译:在自动化程序修复(APR)过程中,为通用编程语言实现的真实系统合成正确补丁极具挑战性。近期研究表明,大型语言模型(LLM)作为“副驾驶”在辅助开发者完成各类编程任务方面具有实用价值,并已被直接应用于补丁合成。然而,多数LLM将程序视为词元序列,这意味着它们对目标编程语言的底层语义约束缺乏理解,导致大量生成的补丁存在静态无效性问题,阻碍了该技术的实际应用。为此,我们提出Repilot框架,通过修复过程中合成更多有效补丁,进一步协同驱动AI“副驾驶”(即LLM)。其核心洞察在于:许多LLM以自回归方式(逐词元)生成输出,类似于人类编写程序的方式,而通过补全引擎的引导与增强可显著提升此类输出的质量。Repilot通过LLM与补全引擎的协同交互来合成候选补丁:1)剪除LLM建议的不可行词元,2)基于补全引擎的建议主动完成词元生成。在广泛使用的Defects4j 1.2和2.0数据集子集上的评估表明,Repilot分别修复了66个和50个缺陷,比最佳基线方法多修复14个和16个缺陷。更重要的是,在相同生成预算下,Repilot比基础LLM能够生成更多有效且正确的补丁。