How Far Can We Go with Practical Function-Level Program Repair?

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.

翻译：近期，基于大型语言模型（LLM）的自动化程序修复（APR）技术被提出以提升修复性能。尽管这些技术主要聚焦于单行或代码块级修复，但由于修复任务范围有限且依赖高成本的语句级故障定位，它们在实际应用中面临重大挑战。然而，更贴近实际需求的函数级APR——它拓宽了APR任务范围以修复完整的有缺陷函数，且仅需低成本的函数级故障定位——至今仍未被充分探索。本文首次对基于LLM的函数级APR展开系统性研究，包括探究少样本学习机制及辅助修复相关信息的影响。具体而言，我们选取六个广泛研究的LLM，并在Defects4J 1.2和2.0数据集上构建基准测试。研究表明，采用零样本学习的LLM已是强大的函数级APR技术，而应用少样本学习机制则会带来差异化的修复性能。此外，我们发现直接将辅助修复相关信息引入LLM能显著提升函数级修复性能。基于研究发现，我们提出一种基于LLM的函数级APR技术——SRepair，它采用双LLM框架以利用辅助修复相关信息提升修复性能。评估结果表明，SRepair能够正确修复Defects4J数据集中300个单函数缺陷，较所有先前APR技术至少提升85%，且无需高成本的语句级故障定位信息。更值得注意的是，SRepair成功修复了Defects4J数据集中32个多函数缺陷——据我们所知，这是APR技术首次实现此成果。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日