This paper summarizes the research conducted for a malware detection project using the Canadian Institute for Cybersecurity's MalMemAnalysis-2022 dataset. The purpose of the project was to explore the effectiveness and efficiency of machine learning techniques for the task of binary classification (i.e., benign or malicious) as well as multi-class classification to further include three malware sub-types (i.e., benign, ransomware, spyware, or Trojan horse). The XGBoost model type was the final model selected for both tasks due to the trade-off between strong detection capability and fast inference speed. The binary classifier achieved a testing subset accuracy and F1 score of 99.98\%, while the multi-class version reached an accuracy of 87.54\% and an F1 score of 81.26\%, with an average F1 score over the malware sub-types of 75.03\%. In addition to the high modelling performance, XGBoost is also efficient in terms of classification speed. It takes about 37.3 milliseconds to classify 50 samples in sequential order in the binary setting and about 43.2 milliseconds in the multi-class setting. The results from this research project help advance the efforts made towards developing accurate and real-time obfuscated malware detectors for the goal of improving online privacy and safety. *This project was completed as part of ELEC 877 (AI for Cybersecurity) in the Winter 2024 term.
翻译:本文总结了利用加拿大网络安全研究所的MalMemAnalysis-2022数据集开展的恶意软件检测项目研究。该项目旨在探究机器学习技术在二元分类(即良性或恶意)任务中的效能与效率,并进一步扩展至包含三种恶意软件子类型(即良性、勒索软件、间谍软件或木马)的多类别分类任务。XGBoost模型因其在强大检测能力与快速推理速度之间的平衡优势,被选为两项任务的最终模型。二元分类器在测试子集上取得了99.98%的准确率与F1分数,多类别分类器则达到87.54%的准确率与81.26%的F1分数,其恶意软件子类型的平均F1分数为75.03%。除卓越的建模性能外,XGBoost在分类速度方面亦表现高效:在二元分类设定下顺序分类50个样本耗时约37.3毫秒,多类别分类设定下耗时约43.2毫秒。本研究成果有助于推动开发精准实时的混淆恶意软件检测器,以提升网络隐私与安全性。*本项目作为2024年冬季学期ELEC 877(网络安全人工智能)课程组成部分完成。