Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs' fixing capabilities and to fine-tune CLMs for the APR task. Firstly, this work is the first to evaluate ten CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 72% more bugs than the state-of-the-art deep-learning (DL)-based APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that fine-tuning brings 31%-1,267% improvement to CLMs and enables them to fix 46%-164% more bugs than existing DL-based APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs. This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs, and also raises awareness of fair and comprehensive evaluations of CLMs and calls for more transparent reporting of open-source repositories used in the pre-training data to address the data leaking problem.
翻译:自动程序修复旨在通过为有缺陷的程序生成补丁来帮助开发者提升软件可靠性。尽管许多代码语言模型已被开发并在代码补全等多项软件任务中表现有效,但尚缺乏全面深入的研究来评估代码语言模型的修复能力,并针对自动程序修复任务对其进行微调。首先,本研究首次在四个自动程序修复基准上评估了十种代码语言模型,结果出人意料地显示:未经调整的最佳代码语言模型所修复的缺陷数量比最先进的基于深度学习的自动程序修复技术高出72%。其次,四个自动程序修复基准之一由本文新创建,旨在避免数据泄露以确保公平评估。第三,本研究首次利用自动程序修复训练数据对代码语言模型进行微调,结果表明微调可为代码语言模型带来31%至1267%的性能提升,使其修复的缺陷数量比现有深度学习方法多46%至164%。第四,本研究探讨了缺陷行的影响,发现未经调整的代码语言模型无法有效利用缺陷行进行修复,而微调后的模型可能过度依赖缺陷行。最后,本研究分析了不同代码语言模型在规模、时间和内存效率方面的表现。本研究为自动程序修复领域指明了有前景的方向,例如通过自动程序修复特定设计进行代码语言模型微调,同时强调了对代码语言模型进行公平全面评估的必要性,并呼吁在预训练数据中更透明地报告所用开源仓库以解决数据泄露问题。