The rapid expansion of varied network systems, including the Internet of Things (IoT) and Industrial Internet of Things (IIoT), has led to an increasing range of cyber threats. Ensuring robust protection against these threats necessitates the implementation of an effective Intrusion Detection System (IDS). For more than a decade, researchers have delved into supervised machine learning techniques to develop IDS to classify normal and attack traffic. However, building effective IDS models using supervised learning requires a substantial number of benign and attack samples. To collect a sufficient number of attack samples from real-life scenarios is not possible since cyber attacks occur occasionally. Further, IDS trained and tested on known datasets fails in detecting zero-day or unknown attacks due to the swift evolution of attack patterns. To address this challenge, we put forth two strategies for semi-supervised learning based IDS where training samples of attacks are not required: 1) training a supervised machine learning model using randomly and uniformly dispersed synthetic attack samples; 2) building a One Class Classification (OCC) model that is trained exclusively on benign network traffic. We have implemented both approaches and compared their performances using 10 recent benchmark IDS datasets. Our findings demonstrate that the OCC model based on the state-of-art anomaly detection technique called usfAD significantly outperforms conventional supervised classification and other OCC based techniques when trained and tested considering real-life scenarios, particularly to detect previously unseen attacks.
翻译:随着物联网(IoT)和工业物联网(IIoT)等多样化网络系统的快速扩展,网络威胁的范围日益增加。为确保对这类威胁的强健防护,必须部署有效的入侵检测系统(IDS)。过去十多年来,研究者们深入探索了监督式机器学习技术以构建能够分类正常流量与攻击流量的IDS。然而,使用监督学习构建有效的IDS模型需要大量正常样本和攻击样本。由于网络攻击偶发性发生,从真实场景中收集足够数量的攻击样本是不可能的。此外,基于已知数据集训练和测试的IDS,因攻击模式的快速演变而无法检测零日攻击或未知攻击。为应对这一挑战,我们提出了两种基于半监督学习的IDS策略,这些策略无需攻击样本进行训练:1)使用随机均匀分布的合成攻击样本训练监督式机器学习模型;2)构建仅基于正常网络流量训练的单类分类(OCC)模型。我们实现了这两种方法,并利用10个最新的基准IDS数据集对其性能进行了比较。研究结果表明,基于最新异常检测技术usfAD的OCC模型在考虑真实场景的训练和测试中,尤其在检测前所未见的攻击方面,显著优于传统监督分类及其他基于OCC的技术。