Vulnerability detection is crucial for maintaining software security, and recent research has explored the use of Language Models (LMs) for this task. While LMs have shown promising results, their performance has been inconsistent across datasets, particularly when generalizing to unseen code. Moreover, most studies have focused on the C/C++ programming language, with limited attention given to other popular languages. This paper addresses this gap by investigating the effectiveness of LMs for vulnerability detection in JavaScript, Java, Python, PHP, and Go, in addition to C/C++ for comparison. We utilize the CVEFixes dataset to create a diverse collection of language-specific vulnerabilities and preprocess the data to ensure quality and integrity. We fine-tune and evaluate state-of-the-art LMs across the selected languages and find that the performance of vulnerability detection varies significantly. JavaScript exhibits the best performance, with considerably better and more practical detection capabilities compared to C/C++. We also examine the relationship between code complexity and detection performance across the six languages and find only a weak correlation between code complexity metrics and the models' F1 scores.
翻译:漏洞检测对于维护软件安全至关重要,近期研究已探索使用语言模型(LMs)来完成此任务。尽管语言模型已展现出有前景的结果,但其在不同数据集上的表现并不一致,尤其是在泛化到未见代码时。此外,大多数研究集中于C/C++编程语言,对其他流行语言的关注有限。本文通过研究语言模型在JavaScript、Java、Python、PHP和Go(以及用于比较的C/C++)中进行漏洞检测的有效性,以弥补这一空白。我们利用CVEFixes数据集创建了一个多样化的、特定于语言的漏洞集合,并对数据进行预处理以确保质量和完整性。我们在所选语言上对最先进的语言模型进行微调和评估,发现漏洞检测的性能存在显著差异。JavaScript表现出最佳性能,与C/C++相比,其检测能力明显更好且更具实用性。我们还考察了六种语言中代码复杂度与检测性能之间的关系,发现代码复杂度指标与模型的F1分数之间仅存在弱相关性。