Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

from arxiv, 9 pages, 8 figures, Accepted for presentation in 18th International Conference on Evaluation of Novel Approaches to Software engineering (ENASE 2023), PRAUGE, CZECH REPUBLIC

Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.

翻译：开发自动化与智能化的软件漏洞检测模型一直是研究和开发领域关注的焦点。该领域面临的最大挑战之一是缺乏涵盖所有编程语言的代码样本。本研究针对此问题提出一种迁移学习技术，通过利用现有数据集生成可用于检测不同编程语言中常见漏洞的模型。我们首先使用C语言源代码样本训练卷积神经网络（CNN）模型，随后采用Java源代码样本对所学模型进行适配与评估。实验采用两个基准数据集：美国国家标准与技术研究院软件保障参考数据集（SARD）与Draper VDISC数据集。结果表明，所提模型对C语言和Java代码的漏洞检测平均召回率达到72%。此外，我们运用可解释人工智能探究各特征对模型在C语言与Java语言间知识迁移机制的贡献程度。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日