Evaluating Pre-Trained Models for Multi-Language Vulnerability Patching

Software vulnerabilities pose critical security risks, demanding prompt and effective mitigation strategies. While advancements in Automated Program Repair (APR) have primarily targeted general software bugs, the domain of vulnerability patching, which is a security-critical subset of APR, remains underexplored. This paper investigates the potential of pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across diverse datasets and five programming languages. We evaluate these models on their accuracy, computational efficiency, and how the length of vulnerable code patches impacts performance. Our findings reveal promising accuracy levels, particularly for CodeT5 on datasets with complex vulnerability patterns, while CodeBERT demonstrates strengths in handling fragmented or context-limited datasets. CodeT5 further showcases superior efficiency, making it well-suited for large-scale applications. However, both models face challenges in maintaining performance as patch length increases, highlighting the complexity of addressing extended in program repair specifically aimed at fixing vulnerabilities. This study benchmarks model performance, highlights key limitations, and offers insights to improve automated vulnerability patching for practical security applications.

翻译：软件漏洞构成严重的安全风险，亟需及时有效的缓解策略。尽管自动程序修复（APR）技术已取得显著进展，但其研究主要集中于通用软件缺陷，而作为APR中安全关键子领域的漏洞修复仍未被充分探索。本文研究了预训练语言模型CodeBERT与CodeT5在跨数据集及五种编程语言的自动化漏洞修复中的潜力。我们从准确性、计算效率以及漏洞代码补丁长度对性能的影响三个维度评估这些模型。实验结果表明，模型展现出良好的准确率，其中CodeT5在具有复杂漏洞模式的数据集上表现尤为突出，而CodeBERT在处理碎片化或上下文受限的数据集时展现出优势。CodeT5进一步显示出卓越的计算效率，使其非常适合大规模应用场景。然而，随着补丁长度的增加，两种模型均面临性能维持的挑战，这凸显了在专门针对漏洞修复的程序修补中处理扩展代码段的复杂性。本研究为模型性能建立了基准，指出了关键局限性，并为改进实际安全应用中的自动化漏洞修复提供了重要见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/