Automated program repair (APR) aims to fix software bugs automatically without human debugging efforts and plays a crucial role in software development and maintenance. Despite promising, APR is still challenged by a long-standing overfitting problem (i.e., the generated patch is plausible but overfitting). Various techniques have thus been proposed to address the overfitting problem. Recently, researchers have employed BERT to extract code features, which are then used to train a classifier for patch correctness prediction. However, BERT is restricted to feature extraction for classifier training without benefiting from the training process, potentially generating sub-optimal vector representations for patched code snippets. In this paper, we propose APPT, a pre-trained model-based automated patch correctness assessment technique by both pre-training and fine-tuning. APPT adopts a pre-trained model as the encoder stack, followed by an LSTM stack and a deep learning classifier. More importantly, the pre-trained model is fine-tuned in conjunction with other components as a whole pipeline to fully adapt it specifically for reasoning about patch correctness. We conduct an extensive experiment on 1,183 Defects4J patches and the experimental results show that APPT achieves prediction accuracy of 79.7% and recall of 83.2%, outperforming CACHE by 4.3% and 6.7%. Our additional investigation on 49,694 real-world patches shows that APPT achieves the optimum performance compared with existing representation learning techniques. We further investigate the impact of each component and find that they all positively contribute to APPT, e.g., the fine-tuning process and the LSTM stack increase F1-score by 10.22% and 4.11%, respectively. We also prove that adopting advanced pre-trained models can further provide substantial advancement, highlighting the generalizability of APPT.
翻译:自动程序修复(APR)旨在无需人工调试即可自动修复软件缺陷,在软件开发与维护中发挥关键作用。尽管前景广阔,APR仍面临长期存在的过拟合问题挑战(即生成的补丁看似合理但存在过拟合)。为此,研究者提出了多种技术应对过拟合问题。近期,研究人员采用BERT提取代码特征,再训练分类器进行补丁正确性预测。然而,BERT仅用于特征提取而未能从训练过程中获益,可能导致为补丁代码片段生成次优的向量表示。本文提出APPT——一种基于预训练模型的自动补丁正确性评估技术,融合了预训练与微调机制。APPT采用预训练模型作为编码器堆栈,其后接LSTM堆栈与深度学习分类器。更关键的是,预训练模型与其他组件作为完整流水线联合微调,使其充分适配补丁正确性推理任务。我们在1,183个Defects4J补丁上开展广泛实验,结果表明APPT的预测准确率达79.7%,召回率达83.2%,分别较CACHE提升4.3%和6.7%。针对49,694个真实补丁的额外验证显示,APPT相较现有表示学习技术实现最优性能。我们进一步探究各组件影响,发现所有组件均对APPT产生积极贡献:例如,微调过程与LSTM堆栈分别使F1值提升10.22%和4.11%。此外,我们证明采用先进预训练模型可进一步带来显著改进,彰显了APPT的泛化能力。