ComPass: Contrastive Learning for Automated Patch Correctness Assessment in Program Repair

Automated program repair (APR) attempts to reduce manual debugging efforts and plays a vital role in software maintenance. Despite remarkable progress, APR is still limited in generating overfitting patches, i.e., patches passing available test suites but incorrect. This issue, known as patch overfitting, has become a key concern in the APR community, with numerous approaches proposed to address it. Very recent work proposes a pre-trained language model (PLM)-based automated patch correctness assessment (APCA) approach, indicating the potential of such PLMs in reasoning about patch correctness. Despite being promising, it is still far from perfect due to various limitations, such as the training paradigm and training dataset. In this paper, we present ComPass, a PLM-based APCA approach that leverages contrastive learning and data augmentation to address the technical limitations of prior work. Our work is inspired by the opportunity to integrate contrastive learning with recent PLMs in the field of patch correctness assessment, where large-scale labeled patches are difficult to obtain. ComPass utilizes code transformation rules to generate semantic-preserving code snippets for both unlabeled pre-training corpus and labeled fine-tuning patches. ComPass then pre-trains PLMs with contrastive learning, which captures code features with the same semantics but different structures. ComPass finally integrates representation embeddings of patch code snippets and fine-tunes PLMs with a binary classifier jointly to assess patch code correctness. Experimental results on 2274 real-world patches from Defects4J demonstrate that ComPass achieves an accuracy of 88.35%, significantly outperforming state-of-the-art baseline APPT.

翻译：自动程序修复（APR）旨在减少人工调试工作量，在软件维护中发挥着至关重要的作用。尽管取得了显著进展，APR 在生成过拟合补丁方面仍存在局限，即补丁能通过现有测试套件但实际不正确。这一被称为“补丁过拟合”的问题已成为 APR 领域的核心关切，已有大量方法被提出以应对该问题。近期研究提出了一种基于预训练语言模型（PLM）的自动补丁正确性评估（APCA）方法，显示了此类 PLM 在推理补丁正确性方面的潜力。尽管前景可观，但由于训练范式与训练数据集等多重局限，该方法仍远未完善。本文提出 ComPass，一种基于 PLM 的 APCA 方法，其利用对比学习与数据增强技术以解决先前工作的技术局限。我们的研究灵感来源于将对比学习与前沿 PLM 相结合应用于补丁正确性评估领域的机会，该领域难以获取大规模标注补丁数据。ComPass 采用代码转换规则为未标注的预训练语料和已标注的微调补丁生成语义保持的代码片段。随后通过对比学习对 PLM 进行预训练，以捕捉语义相同但结构相异的代码特征。最后，ComPass 整合补丁代码片段的表征嵌入，并联合二元分类器对 PLM 进行微调，以评估补丁代码的正确性。在 Defects4J 数据集 2274 个真实补丁上的实验结果表明，ComPass 达到了 88.35% 的准确率，显著优于当前最先进的基线方法 APPT。