The evolution of Internet and its related communication technologies have consistently increased the risk of cyber-attacks. In this context, a crucial role is played by Intrusion Detection Systems (IDSs), which are security devices designed to identify and mitigate attacks to modern networks. Data-driven approaches based on Machine Learning (ML) have gained more and more popularity for executing the classification tasks required by signature-based IDSs. However, typical ML models adopted for this purpose do not properly take into account the uncertainty associated with their prediction. This poses significant challenges, as they tend to produce misleadingly high classification scores for both misclassified inputs and inputs belonging to unknown classes (e.g. novel attacks), limiting the trustworthiness of existing ML-based solutions. In this paper, we argue that ML-based IDSs should always provide accurate uncertainty quantification to avoid overconfident predictions. In fact, an uncertainty-aware classification would be beneficial to enhance closed-set classification performance, would make it possible to carry out Active Learning, and would help recognize inputs of unknown classes as truly unknowns, unlocking open-set classification capabilities and Out-of-Distribution (OoD) detection. To verify it, we compare various ML-based methods for uncertainty quantification and for open-set classification, either specifically designed for or tailored to the domain of network intrusion detection. Moreover, we develop a custom model based on Bayesian Neural Networks to ensure reliable uncertainty estimates and improve the OoD detection capabilities, thus showing how proper uncertainty quantification can be exploited to significantly enhance the trustworthiness of ML-based IDSs.
翻译:互联网及其相关通信技术的发展持续增加了网络攻击的风险。在此背景下,入侵检测系统(IDS)作为旨在识别并缓解现代网络攻击的安全设备发挥着关键作用。基于机器学习的数据驱动方法在执行基于签名的IDS所需的分类任务中日益流行。然而,为此目的采用的典型机器学习模型并未充分考虑其预测结果的不确定性。这带来了重大挑战,因为这些模型倾向于对误分类输入和属于未知类别(如新型攻击)的输入产生具有误导性的高分分类结果,从而限制了现有机器学习方案的可信度。本文主张,基于机器学习的IDS应始终提供准确的不确定性量化,以避免过度自信的预测。事实上,不确定性感知分类有益于提升闭集分类性能,使主动学习成为可能,并有助于将未知类别的输入识别为真正的未知项,从而解锁开集分类能力与分布外(OoD)检测。为验证这一观点,我们比较了多种针对网络入侵检测领域专门设计或定制的不确定性量化与开集分类的机器学习方法。此外,我们开发了基于贝叶斯神经网络的定制模型,以确保可靠的不确定性估计并提升OoD检测能力,由此展示如何利用恰当的不确定性量化显著增强基于机器学习的IDS的可信度。