The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.
翻译:网络犯罪活动的迅速增长以及受其威胁的设备数量不断增加,使得软件安全问题备受关注。由于约90%的攻击利用已知类型的安全漏洞,因此识别易受攻击组件并应用现有缓解技术是打击网络犯罪的可行实践方法。本文研究了包括一种流行深度学习算法在内的最先进机器学习技术在预测JavaScript程序中可能存在安全漏洞的函数方面的表现。我们使用了8种机器学习算法构建预测模型,采用为本研究从Node Security Project和Snyk平台的公共数据库漏洞信息以及GitHub的代码修复补丁中构建的新数据集。我们使用静态源代码度量作为预测因子,并通过广泛的网格搜索算法寻找最佳模型。我们还研究了各种重采样策略对处理数据集不平衡性的影响。表现最佳的算法是KNN,它创建的模型在预测易受攻击函数时的F-measure为0.76(精确率0.91,召回率0.66)。此外,深度学习、基于树和森林的分类器以及SVM也具有竞争力,F-measure均超过0.70。尽管不同重采样策略下的F-measure变化不大,但精确率和召回率的分布确实发生了改变。无重采样似乎倾向于生成高精确率模型,而重采样策略则平衡了IR指标。