APPT: Boosting Automated Patch Correctness Prediction via Fine-tuning Pre-trained Models

Automated program repair (APR) aims to fix software bugs automatically without human debugging efforts and plays a crucial role in software development and maintenance. Despite promising, APR is still challenged by a long-standing overfitting problem (i.e., the generated patch is plausible but overfitting). Various techniques have thus been proposed to address the overfitting problem. Recently, researchers have employed BERT to extract code features, which are then used to train a classifier for patch correctness prediction. However, BERT is restricted to feature extraction for classifier training without benefiting from the training process, potentially generating sub-optimal vector representations for patched code snippets. In this paper, we propose APPT, a pre-trained model-based automated patch correctness assessment technique by both pre-training and fine-tuning. APPT adopts a pre-trained model as the encoder stack, followed by an LSTM stack and a deep learning classifier. More importantly, the pre-trained model is fine-tuned in conjunction with other components as a whole pipeline to fully adapt it specifically for reasoning about patch correctness. We conduct an extensive experiment on 1,183 Defects4J patches and the experimental results show that APPT achieves prediction accuracy of 79.7% and recall of 83.2%, outperforming CACHE by 4.3% and 6.7%. Our additional investigation on 49,694 real-world patches shows that APPT achieves the optimum performance compared with existing representation learning techniques. We further investigate the impact of each component and find that they all positively contribute to APPT, e.g., the fine-tuning process and the LSTM stack increase F1-score by 10.22% and 4.11%, respectively. We also prove that adopting advanced pre-trained models can further provide substantial advancement, highlighting the generalizability of APPT.

翻译：自动程序修复（APR）旨在无需人工调试即可自动修复软件缺陷，在软件开发与维护中发挥关键作用。尽管前景广阔，APR仍面临长期存在的过拟合问题挑战（即生成的补丁看似合理但存在过拟合）。为此，研究者提出了多种技术应对过拟合问题。近期，研究人员采用BERT提取代码特征，再训练分类器进行补丁正确性预测。然而，BERT仅用于特征提取而未能从训练过程中获益，可能导致为补丁代码片段生成次优的向量表示。本文提出APPT——一种基于预训练模型的自动补丁正确性评估技术，融合了预训练与微调机制。APPT采用预训练模型作为编码器堆栈，其后接LSTM堆栈与深度学习分类器。更关键的是，预训练模型与其他组件作为完整流水线联合微调，使其充分适配补丁正确性推理任务。我们在1,183个Defects4J补丁上开展广泛实验，结果表明APPT的预测准确率达79.7%，召回率达83.2%，分别较CACHE提升4.3%和6.7%。针对49,694个真实补丁的额外验证显示，APPT相较现有表示学习技术实现最优性能。我们进一步探究各组件影响，发现所有组件均对APPT产生积极贡献：例如，微调过程与LSTM堆栈分别使F1值提升10.22%和4.11%。此外，我们证明采用先进预训练模型可进一步带来显著改进，彰显了APPT的泛化能力。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日