Vulnerability Detection in Popular Programming Languages with Language Models

Vulnerability detection is crucial for maintaining software security, and recent research has explored the use of Language Models (LMs) for this task. While LMs have shown promising results, their performance has been inconsistent across datasets, particularly when generalizing to unseen code. Moreover, most studies have focused on the C/C++ programming language, with limited attention given to other popular languages. This paper addresses this gap by investigating the effectiveness of LMs for vulnerability detection in JavaScript, Java, Python, PHP, and Go, in addition to C/C++ for comparison. We utilize the CVEFixes dataset to create a diverse collection of language-specific vulnerabilities and preprocess the data to ensure quality and integrity. We fine-tune and evaluate state-of-the-art LMs across the selected languages and find that the performance of vulnerability detection varies significantly. JavaScript exhibits the best performance, with considerably better and more practical detection capabilities compared to C/C++. We also examine the relationship between code complexity and detection performance across the six languages and find only a weak correlation between code complexity metrics and the models' F1 scores.

翻译：漏洞检测对于维护软件安全至关重要，近期研究已探索使用语言模型（LMs）来完成此任务。尽管语言模型已展现出有前景的结果，但其在不同数据集上的表现并不一致，尤其是在泛化到未见代码时。此外，大多数研究集中于C/C++编程语言，对其他流行语言的关注有限。本文通过研究语言模型在JavaScript、Java、Python、PHP和Go（以及用于比较的C/C++）中进行漏洞检测的有效性，以弥补这一空白。我们利用CVEFixes数据集创建了一个多样化的、特定于语言的漏洞集合，并对数据进行预处理以确保质量和完整性。我们在所选语言上对最先进的语言模型进行微调和评估，发现漏洞检测的性能存在显著差异。JavaScript表现出最佳性能，与C/C++相比，其检测能力明显更好且更具实用性。我们还考察了六种语言中代码复杂度与检测性能之间的关系，发现代码复杂度指标与模型的F1分数之间仅存在弱相关性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/