Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.
翻译:近期,基于大型语言模型(LLM)的自动化程序修复(APR)技术被提出以提升修复性能。尽管这些技术主要聚焦于单行或代码块级修复,但由于修复任务范围有限且依赖高成本的语句级故障定位,它们在实际应用中面临重大挑战。然而,更贴近实际需求的函数级APR——它拓宽了APR任务范围以修复完整的有缺陷函数,且仅需低成本的函数级故障定位——至今仍未被充分探索。本文首次对基于LLM的函数级APR展开系统性研究,包括探究少样本学习机制及辅助修复相关信息的影响。具体而言,我们选取六个广泛研究的LLM,并在Defects4J 1.2和2.0数据集上构建基准测试。研究表明,采用零样本学习的LLM已是强大的函数级APR技术,而应用少样本学习机制则会带来差异化的修复性能。此外,我们发现直接将辅助修复相关信息引入LLM能显著提升函数级修复性能。基于研究发现,我们提出一种基于LLM的函数级APR技术——SRepair,它采用双LLM框架以利用辅助修复相关信息提升修复性能。评估结果表明,SRepair能够正确修复Defects4J数据集中300个单函数缺陷,较所有先前APR技术至少提升85%,且无需高成本的语句级故障定位信息。更值得注意的是,SRepair成功修复了Defects4J数据集中32个多函数缺陷——据我们所知,这是APR技术首次实现此成果。