This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era.
翻译:本研究利用机器学习技术对恶意软件检测进行了全面考察,重点基于Mal-API-2019数据集评估了多种分类模型。其目标是通过更有效地识别和缓解威胁来提升网络安全能力。研究探索了集成与非集成两类机器学习方法,包括随机森林、XGBoost、K近邻和神经网络。特别强调了数据预处理技术——尤其是TF-IDF表示和主成分分析——在提升模型性能方面的重要性。结果表明,集成方法(特别是随机森林和XGBoost)在准确性、精确率和召回率方面均优于其他方法,突显了其在恶意软件检测中的有效性。本文还讨论了现有局限性与潜在未来方向,强调需持续适应以应对不断演变的恶意软件。本研究为网络安全领域的持续讨论做出了贡献,并为在数字时代开发更稳健的恶意软件检测系统提供了实践洞察。