Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.
翻译:自动程序修复(APR)技术有望减少在代码审查(CR)过程中发现和修复程序缺陷所需的人工投入。然而,现有APR方法因准确率有限且耗时较高,阻碍了其在工业实践中的采用。其中一个关键因素是对审查评论的利用不足——这些评论提供了关于缺陷及其潜在修复方案的宝贵见解。近年来,大语言模型(LLMs)的进步增强了其理解自然语言和编程语言的能力,使其能够根据审查评论生成补丁。本文系统研究了如何有效利用LLMs修复CR缺陷。本研究基于来自人工审查者和自动化检查工具的两类不同数据集,设计了多种提示词,并在主流LLMs上进行了比较。实验结果表明,使用最佳提示词时修复率达到72.97%,显著提升了自动修复技术的有效性和实用性。