Automated Program Repair has attracted significant research in recent years, leading to diverse techniques that focus on two main directions: search-based and semantic-based program repair. The former techniques often face challenges due to the vast search space, resulting in difficulties in identifying correct solutions, while the latter approaches are constrained by the capabilities of the underlying semantic analyser, limiting their scalability. In this paper, we propose NEVERMORE, a novel learning-based mechanism inspired by the adversarial nature of bugs and fixes. NEVERMORE is built upon the Generative Adversarial Networks architecture and trained on historical bug fixes to generate repairs that closely mimic human-produced fixes. Our empirical evaluation on 500 real-world bugs demonstrates the effectiveness of NEVERMORE in bug-fixing, generating repairs that match human fixes for 21.2% of the examined bugs. Moreover, we evaluate NEVERMORE on the Defects4J dataset, where our approach generates repairs for 4 bugs that remained unresolved by state-of-the-art baselines. NEVERMORE also fixes another 8 bugs which were only resolved by a subset of these baselines. Finally, we conduct an in-depth analysis of the impact of input and training styles on NEVERMORE's performance, revealing where the chosen style influences the model's bug-fixing capabilities.
翻译:近年来,自动化程序修复吸引了大量研究,形成了两种主要方向的多样技术:基于搜索的程序修复与基于语义的程序修复。前者常因庞大的搜索空间而难以准确识别正确修复方案,后者则受限于底层语义分析器的能力,限制了其可扩展性。本文提出NEVERMORE——一种受漏洞与修复对抗性本质启发的新型学习机制。NEVERMORE基于生成对抗网络架构构建,通过历史漏洞修复数据训练,生成高度模仿人工修复的补丁。我们对500个真实漏洞的实证评估表明,NEVERMORE在漏洞修复中具有有效性:针对21.2%的测试漏洞生成了与人工修复匹配的补丁。此外,我们在Defects4J数据集上评估NEVERMORE,发现我们的方法为4个未被现有最优基线解决的漏洞生成了修复补丁,并额外修复了8个仅被部分基线解决的漏洞。最后,我们深入分析了输入格式与训练方式对NEVERMORE性能的影响,揭示了所选模式如何影响模型的漏洞修复能力。