ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs.

翻译：自动程序修复（APR）旨在自动生成补丁以修复软件缺陷。近年来，以ChatGPT为代表的大语言模型（LLM）在APR领域取得了令人鼓舞的成果，尤其是在对话驱动的APR框架中。然而，对话驱动APR的有效性取决于反馈信息的质量。本文提出ContrastRepair，一种新颖的基于对话的APR方法，通过向大语言模型提供对比测试对来增强对话驱动APR。测试对由失败测试和通过测试组成，为LLM提供对比反馈。我们的核心见解在于最小化生成的通过测试与给定失败测试之间的差异，从而更好地分离缺陷的根因。通过提供信息丰富且具体的反馈，ContrastRepair使LLM能够生成有效的缺陷修复。ContrastRepair的实现基于当前最先进的LLM——ChatGPT，并与其迭代交互，直至生成合理的补丁。我们在多个基准数据集（包括Defects4j、QuixBugs和HumanEval-Java）上评估了ContrastRepair。结果表明，ContrastRepair显著优于现有方法，在程序修复领域达到了新的最先进水平。例如，在Defects4j 1.2和2.0中，ContrastRepair正确修复了全部337个缺陷案例中的143个，而表现最佳的基线方法仅修复了124个缺陷。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日