Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.

翻译：在自动程序修复（APR）过程中，为通用编程语言中的真实系统合成正确补丁可能极具挑战性。近年来，大型语言模型（LLM）已被证明是协助开发者完成各类编码任务的有效“副驾驶”，并已直接应用于补丁合成。然而，大多数LLM将程序视为令牌序列，这意味着它们忽视了目标编程语言的基础语义约束。这导致了大量生成的补丁在静态上无效，阻碍了该技术的实用性。因此，我们提出Repilot框架，通过修复过程中合成更多有效补丁，进一步为AI“副驾驶”（即LLM）提供辅助。我们的核心洞见在于，许多LLM以自回归方式（即逐令牌）生成输出，这与人类编写程序类似，而通过补全引擎可以显著增强和引导这一过程。Repilot通过LLM与补全引擎的协同交互来合成候选补丁：1）剪除LLM建议的不可行令牌；2）基于补全引擎提供的建议主动完成令牌。我们在广泛使用的Defects4j 1.2和2.0数据集的子集上进行的评估表明，Repilot分别修复了66个和50个缺陷，超越了性能最佳的基线方法14个和16个缺陷。更重要的是，在给定相同生成预算的情况下，Repilot相比基础LLM能生成更多有效且正确的补丁。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日