Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs.
翻译:自动程序修复(APR)旨在自动生成补丁以修复软件缺陷。近年来,以ChatGPT为代表的大语言模型(LLM)在APR领域取得了令人鼓舞的成果,尤其是在对话驱动的APR框架中。然而,对话驱动APR的有效性取决于反馈信息的质量。本文提出ContrastRepair,一种新颖的基于对话的APR方法,通过向大语言模型提供对比测试对来增强对话驱动APR。测试对由失败测试和通过测试组成,为LLM提供对比反馈。我们的核心见解在于最小化生成的通过测试与给定失败测试之间的差异,从而更好地分离缺陷的根因。通过提供信息丰富且具体的反馈,ContrastRepair使LLM能够生成有效的缺陷修复。ContrastRepair的实现基于当前最先进的LLM——ChatGPT,并与其迭代交互,直至生成合理的补丁。我们在多个基准数据集(包括Defects4j、QuixBugs和HumanEval-Java)上评估了ContrastRepair。结果表明,ContrastRepair显著优于现有方法,在程序修复领域达到了新的最先进水平。例如,在Defects4j 1.2和2.0中,ContrastRepair正确修复了全部337个缺陷案例中的143个,而表现最佳的基线方法仅修复了124个缺陷。