Automated Program Repair has attracted significant research in recent years, leading to diverse techniques which focus on two main directions: search-based and semantic-based program repair. The former techniques often face challenges due to the vast search space, resulting in difficulties in identifying correct solutions, while the latter approaches are constrained by the capabilities of the underlying semantic analyser, limiting their scalability. In this paper, we propose NEVERMORE, a novel learning-based mechanism inspired by the adversarial nature of bugs and fixes. NEVERMORE is built upon the Generative Adversarial Networks architecture and trained on historical bug fixes to generate repairs that closely mimic human-produced fixes. Our empirical evaluation on 500 real-world bugs demonstrates the effectiveness of NEVERMORE in bug-fixing, achieving a BLEU-4 score of 42% and generating repairs that match human fixes for 21.2% of the examined bugs. Moreover, we evaluate NEVERMORE on the Defects4J dataset, where our approach generates repairs for 4 bugs that remained unresolved by state-of-the-art baselines. NEVERMORE also fixes another 8 bugs which were only resolved by a subset of these baselines. Finally, we conduct an in-depth analysis of the impact of input and training styles on NEVERMORE's performance, revealing where the chosen style influences the model's bug-fixing capabilities.
翻译:自动程序修复近年来吸引了大量研究,形成了两大主流方向:基于搜索的程序修复和基于语义的程序修复。前者常因搜索空间庞大而难以找到正确解决方案,后者则受限于底层语义分析器的能力,限制了其可扩展性。本文提出NEVERMORE——一种受漏洞与修复的对抗性质启发的新型学习机制。NEVERMORE基于生成对抗网络架构构建,通过训练历史漏洞修复数据生成高度模仿人工修复的补丁。我们针对500个真实世界漏洞的实证评估表明,NEVERMORE在漏洞修复中表现有效,BLEU-4得分为42%,并对21.2%的测试漏洞生成了与人工修复匹配的补丁。此外,我们在Defects4J数据集上评估NEVERMORE,该方法为4个未被现有最优基线解决的漏洞生成了修复补丁,并额外修复了8个仅被部分基线解决的漏洞。最后,我们深入分析了输入与训练风格对NEVERMORE性能的影响,揭示了所选风格如何影响模型的漏洞修复能力。