Vulnerability fixes in open source software (OSS) usually follow the coordinated vulnerability disclosure model and are silently fixed. This delay can expose OSS users to risks as malicious parties might exploit the software before fixes are publicly known. Therefore, it is important to identify vulnerability fixes early and automatically. Existing methods classify vulnerability fixes by learning code change representations from commits, typically by concatenating code changes, which does not effectively highlight nuanced differences. Additionally, previous approaches fine-tune code embedding models and classification models separately, which limits overall effectiveness. We propose VFDelta, a lightweight yet effective framework that embeds code before and after changes using independent models with surrounding code as context. By performing element-wise subtraction on these embeddings, we capture fine-grain changes. Our architecture allows joint training of embedding and classification models, optimizing overall performance. Experiments demonstrate that VFDelta achieves up to 0.33 F1 score and 0.63 CostEffort@5, improving over state-of-the-art methods by 77.4% and 7.1%, respectively. Ablation analysis confirms the importance of our code change representation in capturing small changes. We also expanded the dataset and introduced a temporal split to simulate real-world scenarios; VFDelta significantly outperforms baselines VulFixMiner and MiDas across all metrics in this setting.
翻译:开源软件(OSS)中的漏洞修复通常遵循协调漏洞披露模型,并以静默方式修复。这种延迟可能使OSS用户面临风险,因为恶意方可能在修复方案公开之前利用该软件。因此,及早且自动地识别漏洞修复至关重要。现有方法通过学习提交记录中的代码变更表示来分类漏洞修复,通常采用拼接代码变更的方式,这无法有效突出细微差异。此外,先前的方法分别微调代码嵌入模型和分类模型,限制了整体效果。我们提出了VFDelta,一个轻量级但有效的框架,它使用独立的模型将变更前后的代码嵌入到上下文中。通过对这些嵌入向量执行逐元素减法,我们捕获了细粒度的变更。我们的架构允许联合训练嵌入模型和分类模型,从而优化整体性能。实验表明,VFDelta实现了高达0.33的F1分数和0.63的CostEffort@5,分别比现有最优方法提高了77.4%和7.1%。消融分析证实了我们的代码变更表示在捕获微小变更方面的重要性。我们还扩展了数据集并引入了时间分割以模拟真实场景;在此设置下,VFDelta在所有指标上均显著优于基线方法VulFixMiner和MiDas。