Software vulnerabilities are flaws in computer software systems that pose significant threats to the integrity, security, and reliability of modern software and its application data. These vulnerabilities can lead to substantial economic losses across various industries. Manual vulnerability repair is not only time-consuming but also prone to errors. To address the challenges of vulnerability repair, researchers have proposed various solutions, with learning-based automatic vulnerability repair techniques gaining widespread attention. However, existing methods often focus on learning more vulnerability data to improve repair outcomes, while neglecting the diverse characteristics of vulnerable code, and suffer from imprecise vulnerability localization.To address these shortcomings, this paper proposes CRepair, a CVAE-based automatic vulnerability repair technology aimed at fixing security vulnerabilities in system code. We first preprocess the vulnerability data using a prompt-based method to serve as input to the model. Then, we apply causal inference techniques to map the vulnerability feature data to probability distributions. By employing multi-sample feature fusion, we capture diverse vulnerability feature information. Finally, conditional control is used to guide the model in repairing the vulnerabilities.Experimental results demonstrate that the proposed method significantly outperforms other benchmark models, achieving a perfect repair rate of 52%. The effectiveness of the approach is validated from multiple perspectives, advancing AI-driven code vulnerability repair and showing promising applications.
翻译:软件漏洞是计算机软件系统中的缺陷,对现代软件及其应用数据的完整性、安全性和可靠性构成重大威胁。这些漏洞可能导致各行业遭受巨大的经济损失。手动漏洞修复不仅耗时,而且容易出错。为应对漏洞修复的挑战,研究人员提出了多种解决方案,其中基于学习的自动化漏洞修复技术获得了广泛关注。然而,现有方法通常侧重于学习更多漏洞数据以改善修复效果,却忽视了漏洞代码的多样性特征,且存在漏洞定位不精确的问题。针对这些不足,本文提出了CRepair,一种基于CVAE的自动化漏洞修复技术,旨在修复系统代码中的安全漏洞。我们首先使用基于提示的方法对漏洞数据进行预处理,作为模型的输入。然后,应用因果推断技术将漏洞特征数据映射到概率分布。通过采用多样本特征融合,我们捕获了多样化的漏洞特征信息。最后,利用条件控制引导模型修复漏洞。实验结果表明,所提方法显著优于其他基准模型,实现了52%的完美修复率。该方法的有效性从多个角度得到了验证,推动了人工智能驱动的代码漏洞修复研究,并展现出良好的应用前景。