Botnets could autonomously infect, propagate, communicate and coordinate with other members in the botnet, enabling cybercriminals to exploit the cumulative computing and bandwidth of its bots to facilitate cybercrime. Traditional detection methods are becoming increasingly unsuitable against various network-based detection evasion methods. These techniques ultimately render signature-based fingerprinting detection infeasible and thus this research explores the application of network flow-based behavioural modelling to facilitate the binary classification of bot network activity, whereby the detection is independent of underlying communications architectures, ports, protocols and payload-based detection evasion mechanisms. A comparative evaluation of various machine learning classification methods is conducted, to precisely determine the average accuracy of each classifier on bot datasets like CTU-13, ISOT 2010 and ISCX 2014. Additionally, hyperparameter tuning using Genetic Algorithm (GA), aiming to efficiently converge to the fittest hyperparameter set for each dataset was done. The bioinspired optimisation of Random Forest (RF) with GA achieved an average accuracy of 99.85% when it was tested against the three datasets. The model was then developed into a software product. The YouTube link of the project and demo of the software developed: https://youtu.be/gNQjC91VtOI
翻译:僵尸网络能够自主感染、传播、通信并与网络内其他成员协调,使网络犯罪分子得以利用其僵尸节点的累积计算资源与带宽实施网络犯罪。面对各类基于网络的检测规避手段,传统检测方法日益失效。这些技术最终使得基于签名的指纹检测难以实施,因此本研究探索应用基于网络流量的行为建模,以实现对僵尸网络活动的二元分类,从而使检测独立于底层通信架构、端口、协议及基于载荷的检测规避机制。本研究对多种机器学习分类方法进行了比较评估,以精确测定各分类器在CTU-13、ISOT 2010及ISCX 2014等僵尸网络数据集上的平均准确率。此外,研究采用遗传算法(GA)进行超参数调优,旨在针对各数据集高效收敛至最优超参数组合。经仿生优化的随机森林(RF)模型结合GA后,在三个数据集上的测试平均准确率达到99.85%。该模型最终被开发为软件产品。项目介绍与软件演示的YouTube链接为:https://youtu.be/gNQjC91VtOI