Software vulnerabilities pose critical security risks, demanding prompt and effective mitigation strategies. While advancements in Automated Program Repair (APR) have primarily targeted general software bugs, the domain of vulnerability patching, which is a security-critical subset of APR, remains underexplored. This paper investigates the potential of pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across diverse datasets and five programming languages. We evaluate these models on their accuracy, computational efficiency, and how the length of vulnerable code patches impacts performance. Our findings reveal promising accuracy levels, particularly for CodeT5 on datasets with complex vulnerability patterns, while CodeBERT demonstrates strengths in handling fragmented or context-limited datasets. CodeT5 further showcases superior efficiency, making it well-suited for large-scale applications. However, both models face challenges in maintaining performance as patch length increases, highlighting the complexity of addressing extended in program repair specifically aimed at fixing vulnerabilities. This study benchmarks model performance, highlights key limitations, and offers insights to improve automated vulnerability patching for practical security applications.
翻译:软件漏洞构成严重的安全风险,亟需及时有效的缓解策略。尽管自动程序修复(APR)技术已取得显著进展,但其研究主要集中于通用软件缺陷,而作为APR中安全关键子领域的漏洞修复仍未被充分探索。本文研究了预训练语言模型CodeBERT与CodeT5在跨数据集及五种编程语言的自动化漏洞修复中的潜力。我们从准确性、计算效率以及漏洞代码补丁长度对性能的影响三个维度评估这些模型。实验结果表明,模型展现出良好的准确率,其中CodeT5在具有复杂漏洞模式的数据集上表现尤为突出,而CodeBERT在处理碎片化或上下文受限的数据集时展现出优势。CodeT5进一步显示出卓越的计算效率,使其非常适合大规模应用场景。然而,随着补丁长度的增加,两种模型均面临性能维持的挑战,这凸显了在专门针对漏洞修复的程序修补中处理扩展代码段的复杂性。本研究为模型性能建立了基准,指出了关键局限性,并为改进实际安全应用中的自动化漏洞修复提供了重要见解。