A Survey of Learning-based Automated Program Repair

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}.

翻译：自动程序修复（APR）旨在自动修复软件缺陷，在软件开发和维护中发挥着关键作用。随着深度学习（DL）的最新进展，越来越多的APR技术被提出，利用神经网络从大规模开源代码仓库中学习缺陷修复模式。这类基于学习的技术通常将APR视为神经机器翻译（NMT）任务，其中缺陷代码片段（即源语言）被自动翻译为修复后的代码片段（即目标语言）。得益于深度学习从历史缺陷修复数据集中学习隐藏关系的强大能力，基于学习的APR技术取得了显著性能。本文通过系统性综述，总结了当前基于学习的APR社区中最前沿的研究成果。我们阐述了基于学习的APR技术的一般工作流程，并详细介绍了关键组件，包括故障定位、补丁生成、补丁排序、补丁验证和补丁正确性阶段。随后讨论了广泛采用的数据集和评估指标，并概述了现有的实证研究。我们探讨了基于学习的APR技术的若干关键方面，如修复领域、工业部署和开放科学问题。针对未来APR研究，我们强调了应用深度学习技术的若干实践指南，例如探索可解释的补丁生成和利用代码特征。总体而言，本文可帮助研究人员全面了解现有基于学习的APR技术的成就，并促进这些技术的实际应用。我们的相关资源公开于\url{https://github.com/QuanjunZhang/AwesomeLearningAPR}。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日