In the modern era, malware is experiencing a significant increase in both its variety and quantity, aligning with the widespread adoption of the digital world. This surge in malware has emerged as a critical challenge in the realm of cybersecurity, prompting numerous research endeavors and contributions to address the issue. Machine learning algorithms have been leveraged for malware detection due to their ability to uncover concealed patterns within vast datasets. However, deep learning algorithms, characterized by their multi-layered structure, surpass the limitations of traditional machine learning approaches. By employing deep learning techniques such as CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network), this study aims to classify and identify malware extracted from a dataset containing API call sequences. The performance of these algorithms is compared with that of conventional machine learning methods, including SVM (Support Vector Machine), RF (Random Forest), KNN (K-Nearest Neighbors), XGB (Extreme Gradient Boosting), and GBC (Gradient Boosting Classifier), all using the same dataset. The outcomes of this research demonstrate that both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
翻译:在当今时代,恶意软件的多样性和数量均显著增长,这与数字世界的广泛普及相呼应。恶意软件的激增已成为网络安全领域的一项严峻挑战,促使众多研究工作和贡献致力于解决该问题。机器学习算法因其能够从海量数据集中发现隐藏模式而被用于恶意软件检测。然而,深度学习算法凭借其多层结构,超越了传统机器学习方法的局限性。本研究采用诸如CNN(卷积神经网络)和RNN(循环神经网络)等深度学习技术,对从包含API调用序列的数据集中提取的恶意软件进行分类与识别。这些算法的性能与传统机器学习方法(包括SVM(支持向量机)、RF(随机森林)、KNN(K近邻)、XGB(极限梯度提升)和GBC(梯度提升分类器))在相同数据集上进行了比较。研究结果表明,深度学习与机器学习算法均能达到极高的准确率,在某些情况下甚至高达99%。