Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions.
翻译:软件补丁在代码库的改进与演化中至关重要,可修复缺陷、漏洞并实现优化。补丁描述详细记录了代码变更内容,有助于开发者理解与协作。然而,手动编写描述面临耗时、质量参差不齐等挑战。本文提出方法PATCHEXPLAINER,通过将补丁描述生成转换为机器翻译任务来应对这些挑战。该方法利用关键元素、历史上下文和语法惯例的显式表示,并设计具有描述相似性感知能力的翻译模型。具体而言,模型经过显式训练,能够识别并整合聚类补丁描述间的相似性,从而提升对相似补丁生成一致且准确描述的能力。双目标优化策略可最大化相似性并准确预测补丁所属类别。基于真实软件补丁大型数据集的实验表明,PATCHEXPLAINER在BLEU值提升189%、精确匹配率提升5.7倍、语义相似度提升154%等指标上均持续超越现有方法,充分验证了其生成软件补丁描述的有效性。