Impact of Code Language Models on Automated Program Repair

Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs' fixing capabilities and to fine-tune CLMs for the APR task. Firstly, this work is the first to evaluate ten CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 72% more bugs than the state-of-the-art deep-learning (DL)-based APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that fine-tuning brings 31%-1,267% improvement to CLMs and enables them to fix 46%-164% more bugs than existing DL-based APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs. This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs, and also raises awareness of fair and comprehensive evaluations of CLMs and calls for more transparent reporting of open-source repositories used in the pre-training data to address the data leaking problem.

翻译：自动程序修复旨在通过为有缺陷的程序生成补丁来帮助开发者提升软件可靠性。尽管许多代码语言模型已被开发并在代码补全等多项软件任务中表现有效，但尚缺乏全面深入的研究来评估代码语言模型的修复能力，并针对自动程序修复任务对其进行微调。首先，本研究首次在四个自动程序修复基准上评估了十种代码语言模型，结果出人意料地显示：未经调整的最佳代码语言模型所修复的缺陷数量比最先进的基于深度学习的自动程序修复技术高出72%。其次，四个自动程序修复基准之一由本文新创建，旨在避免数据泄露以确保公平评估。第三，本研究首次利用自动程序修复训练数据对代码语言模型进行微调，结果表明微调可为代码语言模型带来31%至1267%的性能提升，使其修复的缺陷数量比现有深度学习方法多46%至164%。第四，本研究探讨了缺陷行的影响，发现未经调整的代码语言模型无法有效利用缺陷行进行修复，而微调后的模型可能过度依赖缺陷行。最后，本研究分析了不同代码语言模型在规模、时间和内存效率方面的表现。本研究为自动程序修复领域指明了有前景的方向，例如通过自动程序修复特定设计进行代码语言模型微调，同时强调了对代码语言模型进行公平全面评估的必要性，并呼吁在预训练数据中更透明地报告所用开源仓库以解决数据泄露问题。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日