How Far Can We Go with Practical Function-Level Program Repair?

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.

翻译：近年来，多种基于大语言模型（LLM）的自动程序修复（APR）技术被提出以提升修复性能。尽管这些技术主要关注单行或代码块级别的修复，但由于修复任务范围有限且语句级故障定位成本高昂，它们在现实应用中面临重大挑战。然而，更具实用性的函数级APR——将APR任务范围扩展至修复整个缺陷函数，且仅需成本低廉的函数级故障定位——仍未得到充分探索。本文首次对基于LLM的函数级APR进行了全面研究，包括探究少样本学习机制与辅助修复相关信息的影响。具体而言，我们采用六个广泛研究的LLM，并在Defects4J 1.2和2.0数据集中构建了基准测试。研究表明，采用零样本学习的LLM本身已是强大的函数级APR技术，而应用少样本学习机制则会导致修复性能出现显著差异。此外，我们发现直接将辅助修复相关信息输入LLM能显著提升函数级修复性能。基于这些发现，我们提出了一种基于LLM的函数级APR技术——SRepair，该技术采用双LLM框架以利用辅助修复相关信息提升修复性能。评估结果表明，SRepair能在Defects4J数据集中正确修复300个单函数缺陷，较以往所有APR技术至少提升85%，且无需昂贵的语句级故障定位信息。此外，SRepair成功修复了Defects4J数据集中的32个多函数缺陷，据我们所知，这是所有APR技术首次实现该突破。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日