This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era.
翻译:本研究利用机器学习技术对恶意软件检测进行了深入探讨,重点评估了基于Mal-API-2019数据集的多种分类模型性能。其目标是通过更有效地识别和缓解威胁来提升网络安全能力。研究探索了集成与非集成机器学习方法,包括随机森林、XGBoost、K近邻(KNN)和神经网络。特别强调了数据预处理技术(尤其是TF-IDF表示和主成分分析)在提升模型性能中的重要性。结果表明,集成方法(特别是随机森林和XGBoost)在准确率、精确率和召回率上均优于其他方法,凸显了其在恶意软件检测中的有效性。本文还讨论了现有局限性及未来潜在研究方向,强调了持续适应恶意软件演变特性的必要性。该研究为网络安全领域的持续讨论提供了贡献,并为数字时代构建更稳健的恶意软件检测系统提供了实践见解。