Automated program repair (APR) techniques have achieved conspicuous progress, and are now capable of producing genuinely correct fixes in scenarios that were well beyond their capabilities only a few years ago. Nevertheless, even when an APR technique can find a correct fix for a bug, it still runs the risk of ranking the fix lower than other patches that are plausible (they pass all available tests) but incorrect. This can seriously hurt the technique's practical effectiveness, as the user will have to peruse a larger number of patches before finding the correct one. This paper presents PrevaRank, a technique that ranks plausible patches produced by any APR technique according to their feature similarity with historic programmer-written fixes for similar bugs. PrevaRank implements simple heuristics, which help make it scalable and applicable to any APR tool that produces plausible patches. In our experimental evaluation, after training PrevaRank on the fix history of 81 open-source Java projects, we used it to rank patches produced by 8 Java APR tools on 168 Defects4J bugs. PrevaRank consistently improved the ranking of correct fixes: for example, it ranked a correct fix within the top-3 positions in 27% more cases than the original tools did. Other experimental results indicate that PrevaRank works robustly with a variety of APR tools and bugs, with negligible overhead.
翻译:自动程序修复(APR)技术已取得显著进展,如今能够在几年前完全无法应对的场景中生成真正正确的修复方案。然而,即使APR技术能够为某个缺陷找到正确修复,其仍面临将正确修复的排序置于其他"看似合理"(能通过所有可用测试)但实际错误的补丁之后的风险。这会严重削弱该技术的实际效用,因为用户需要浏览更多补丁才能找到正确修复。本文提出PrevaRank技术,该技术可根据任意APR工具生成的合理补丁与历史上程序员针对类似缺陷所编写修复方案的特征相似度进行排序。PrevaRank采用简单的启发式规则,使其具备可扩展性并适用于所有能生成合理补丁的APR工具。在实验评估中,我们基于81个开源Java项目的修复历史训练PrevaRank后,将其用于对8种Java APR工具在168个Defects4J缺陷上生成的补丁进行排序。PrevaRank持续提升了正确修复的排序表现:例如,相较于原始工具,其将正确修复排在前3位的案例数量增加了27%。其他实验结果表明,PrevaRank能稳健适配多种APR工具与缺陷类型,且运行时开销可忽略不计。