In recent years, the rapid development of deep learning technology has brought new prospects to the field of vulnerability detection. Many vulnerability detection methods involve converting source code into images for detection, yet they often overlook the quality of the generated images. Due to the fact that vulnerability images lack clear and continuous contours, unlike images used in object detection, Convolutional Neural Networks (CNNs) tend to lose semantic information during the convolution and pooling processes. Therefore, this paper proposes a pixel row oversampling method based on code line concatenation to generate more continuous code features, addressing the issue of discontinuity in code image coloration.Building upon these contributions, we propose the vulnerability detection system VulMCI and conduct tests on the SARD and NVD datasets. Experimental results demonstrate that VulMCI outperforms seven state-of-the-art vulnerability detectors (namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR, VulCNN, and Devign). Compared to other image-based methods, VulMCI shows improvements in various metrics, including a 2.877\% increase in True Positive Rate (TPR), a 5.446\% increase in True Negative Rate (TNR), and a 5.91\% increase in Accuracy (ACC). On the NVD real-world dataset, VulMCI achieves an average accuracy of 5.162\%, confirming its value in practical vulnerability detection applications.
翻译:近年来,深度学习技术的快速发展为漏洞检测领域带来了新的前景。许多漏洞检测方法将源代码转换为图像进行检测,但往往忽略了生成图像的质量。由于漏洞图像与目标检测中使用的图像不同,缺乏清晰且连续的轮廓,卷积神经网络在卷积和池化过程中容易丢失语义信息。因此,本文提出了一种基于代码行拼接的像素行过采样方法,以生成更连续的代码特征,解决代码图像着色不连续的问题。基于这些贡献,我们提出了漏洞检测系统VulMCI,并在SARD和NVD数据集上进行了测试。实验结果表明,VulMCI在性能上优于七种最先进的漏洞检测器(即Checkmarx、FlawFinder、RATS、VulDeePecker、SySeVR、VulCNN和Devign)。与其他基于图像的方法相比,VulMCI在多个指标上均有提升,包括真阳性率提高2.877%、真阴性率提高5.446%、准确率提高5.91%。在NVD真实数据集上,VulMCI的平均准确率达到5.162%,证实了其在实际漏洞检测应用中的价值。