Machine Learning (ML) has become a valuable asset to solve many real-world tasks. For Network Intrusion Detection (NID), however, scientific advances in ML are still seen with skepticism by practitioners. This disconnection is due to the intrinsically limited scope of research papers, many of which primarily aim to demonstrate new methods ``outperforming'' prior work -- oftentimes overlooking the practical implications for deploying the proposed solutions in real systems. Unfortunately, the value of ML for NID depends on a plethora of factors, such as hardware, that are often neglected in scientific literature. This paper aims to reduce the practitioners' skepticism towards ML for NID by "changing" the evaluation methodology adopted in research. After elucidating which "factors" influence the operational deployment of ML in NID, we propose the notion of "pragmatic assessment", which enable practitioners to gauge the real value of ML methods for NID. Then, we show that the state-of-research hardly allows one to estimate the value of ML for NID. As a constructive step forward, we carry out a pragmatic assessment. We re-assess existing ML methods for NID, focusing on the classification of malicious network traffic, and consider: hundreds of configuration settings; diverse adversarial scenarios; and four hardware platforms. Our large and reproducible evaluations enable estimating the quality of ML for NID. We also validate our claims through a user-study with security practitioners.
翻译:机器学习已成为解决许多实际任务的重要工具。然而在网络入侵检测领域,机器学习方面的科学进展仍受到从业者的质疑。这种脱节源于研究论文固有的局限性——许多论文主要旨在展示新方法"超越"先前工作,却常常忽视在真实系统中部署所提出方案的实际影响。不幸的是,机器学习对网络入侵检测的价值取决于诸多因素(如硬件),而这些因素在科学文献中常被忽略。本文旨在通过"改变"研究中采用的评估方法来减少从业者对机器学习的怀疑。在阐明影响网络入侵检测中机器学习实际部署的"因素"后,我们提出"实用评估"概念,使从业者能够衡量机器学习方法对网络入侵检测的真实价值。随后,我们证明当前研究现状几乎无法让研究者估算机器学习的价值。作为建设性举措,我们开展了实用评估:重新评估现有用于网络入侵检测的机器学习方法,重点聚焦恶意网络流量分类问题,并考虑数百种配置设置、多种对抗场景以及四个硬件平台。我们大规模且可复现的评估使得机器学习对网络入侵检测的质量估算成为可能。此外,我们通过与安全从业者的用户研究验证了相关结论。