Automated generate-and-validate (G&V) program repair techniques typically rely on hard-coded rules, only fix bugs following specific patterns, and are hard to adapt to different programming languages. We propose ENCORE, a new G&V technique, which uses ensemble learning on convolutional neural machine translation (NMT) models to automatically fix bugs in multiple programming languages. We take advantage of the randomness in hyper-parameter tuning to build multiple models that fix different bugs and combine them using ensemble learning. This new convolutional NMT approach outperforms the standard long short-term memory (LSTM) approach used in previous work, as it better captures both local and long-distance connections between tokens. Our evaluation on two popular benchmarks, Defects4J and QuixBugs, shows that ENCORE fixed 42 bugs, including 16 that have not been fixed by existing techniques. In addition, ENCORE is the first G&V repair technique to be applied to four popular programming languages (Java, C++, Python, and JavaScript), fixing a total of 67 bugs across five benchmarks.
翻译:自动生成-验证(G&V)程序修复技术通常依赖硬编码规则,仅能修复符合特定模式的缺陷,且难以适应不同编程语言。我们提出ENCoRE——一种新型G&V技术,该技术采用卷积神经机器翻译(NMT)模型的集成学习,自动修复多种编程语言的缺陷。通过利用超参数调优中的随机性构建多个能修复不同缺陷的模型,并运用集成学习进行组合优化。这种新型卷积NMT方法优于先前工作中使用的标准长短期记忆(LSTM)方法,能更有效地捕获标记之间的局部与长距离连接关系。在Defects4J和QuixBugs两个主流基准测试集上的评估结果显示,ENCoRE共修复42个缺陷,其中16个为现有技术未能修复的缺陷。此外,ENCoRE是首个可应用于四种主流编程语言(Java、C++、Python和JavaScript)的G&V修复技术,在五个基准测试集上合计修复67个缺陷。