The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.
翻译:工业物联网(IIoT)作为运营技术(OT)的补充技术被广泛采用,实现了标准化数据访问与流程可视化的新水平。信息技术(IT)、OT与IIoT的融合也带来了必须管控的新型网络安全漏洞与风险。人工智能(AI)正逐渐成为监控OT/IIoT网络中恶意活动的有力工具,并已成为高度活跃的研究领域。AI研究者正在应用先进的机器学习(ML)与深度学习(DL)技术来检测网络流量中的异常或恶意活动。他们通常使用源自IoT/IIoT/OT网络流量捕获的数据集来衡量所提出方法的性能表现。因此,算法测试对数据集存在普遍需求。本研究系统综述了公开可用的基于网络流量捕获的数据集,包括所含攻击类型的分类、元数据审查,以及统计与复杂度分析。通过分析每个数据集,为研究者提供可用于根据其研究问题选择最佳数据集的元数据。这为研究社区带来了附加效益,使研究者能够更轻松地根据其特定机器学习目标选择最适合的研究数据集。