Redundancy-based automated program repair (APR), which generates patches by referencing existing source code, has gained much attention since they are effective in repairing real-world bugs with good interpretability. However, since existing approaches either demand the existence of multi-line similar code or randomly reference existing code, they can only repair a small number of bugs with many incorrect patches, hindering their wide application in practice. In this work, we aim to improve the effectiveness of redundancy-based APR by exploring more effective source code reuse methods for improving the number of correct patches and reducing incorrect patches. Specifically, we have proposed a new repair technique named Repatt, which incorporates a two-level pattern mining process for guiding effective patch generation (i.e., token and expression levels). We have conducted an extensive experiment on the widely-used Defects4J benchmark and compared Repatt with eight state-of-the-art APR approaches. The results show that our approach complements existing approaches by repairing {15} unique bugs compared with the latest deep learning-based methods and {19} unique bugs compared with traditional repair methods when providing the perfect fault localization. In addition, when the perfect fault localization is unknown in real practice, Repatt significantly outperforms the baseline approaches by achieving much higher patch precision, i.e., {83.8\%}. Moreover, we further proposed an effective patch ranking strategy for combining the strength of Repatt and the baseline methods. The result shows that it repairs 124 bugs when only considering the Top-1 patches and improves the best-performing repair method by repairing 39 more bugs. The results demonstrate the effectiveness of our approach for practical use.
翻译:基于冗余的自动化程序修复(APR)通过引用现有源代码生成补丁,因其在修复真实缺陷时具有良好可解释性而备受关注。然而,现有方法要么要求存在多行相似代码,要么随机引用现有代码,导致其仅能修复少量缺陷且产生大量错误补丁,阻碍了实际应用。本研究旨在通过探索更有效的源代码复用方法提升冗余式APR的有效性,从而提高正确补丁数量并减少错误补丁。具体而言,我们提出了一种名为Repatt的新修复技术,该技术通过双层模式挖掘(即令牌级与表达式级)指导有效补丁生成。我们在广泛使用的Defects4J基准上开展大量实验,将Repatt与八种前沿APR方法进行比较。结果表明:在提供完美故障定位的情况下,与最新深度学习方法相比,我们的方法能额外修复15个独特缺陷;与传统修复方法相比,能额外修复19个独特缺陷。此外,在实际缺乏完美故障定位信息时,Repatt的补丁精确率高达83.8%,显著优于基线方法。我们还进一步提出了有效的补丁排序策略以融合Repatt与基线方法的优势。实验显示,当仅考虑排名第一的补丁时,该方法能修复124个缺陷,相比最优修复方法多修复39个缺陷。这些结果证明了该方法在实际应用中的有效性。