As cyber-attacks become more sophisticated, improving the robustness of Machine Learning (ML) models must be a priority for enterprises of all sizes. To reliably compare the robustness of different ML models for cyber-attack detection in enterprise computer networks, they must be evaluated in standardized conditions. This work presents a methodical adversarial robustness benchmark of multiple decision tree ensembles with constrained adversarial examples generated from standard datasets. The robustness of regularly and adversarially trained RF, XGB, LGBM, and EBM models was evaluated on the original CICIDS2017 dataset, a corrected version of it designated as NewCICIDS, and the HIKARI dataset, which contains more recent network traffic. NewCICIDS led to models with a better performance, especially XGB and EBM, but RF and LGBM were less robust against the more recent cyber-attacks of HIKARI. Overall, the robustness of the models to adversarial cyber-attack examples was improved without their generalization to regular traffic being affected, enabling a reliable detection of suspicious activity without costly increases of false alarms.
翻译:随着网络攻击日益复杂,提升机器学习(ML)模型的鲁棒性必须成为各类企业的优先事项。为了可靠比较不同ML模型在企业计算机网络攻击检测中的鲁棒性,必须在标准化条件下进行评估。本文提出了一种基于标准数据集生成的受限对抗样本的多决策树集成方法系统化对抗鲁棒性基准测试。在原始CICIDS2017数据集、其修正版本NewCICIDS以及包含更现代网络流量的HIKARI数据集上,评估了常规训练和对抗训练的RF、XGB、LGBM和EBM模型的鲁棒性。NewCICIDS数据集使模型获得了更优性能(尤其是XGB和EBM),但RF和LGBM对HIKARI中更现代的网络攻击鲁棒性较弱。总体而言,模型对对抗性网络攻击样本的鲁棒性得到提升,且其对常规流量的泛化能力未受影响,从而在不显著增加误报成本的情况下实现了对可疑活动的可靠检测。