The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.
翻译:工业物联网作为运营技术的补充技术,实现了标准化数据访问与流程可见性的新高度。信息技术、运营技术与工业物联网的融合也产生了必须加以管控的新网络安全漏洞与风险。人工智能正成为监控OT/IIoT网络恶意活动的强效工具,并成为高度活跃的研究领域。AI研究者将先进的机器学习和深度学习技术应用于网络流量中异常或恶意行为的检测,通常采用从IoT/IIoT/OT网络流量捕获中提取的数据集来评估所提方法的性能。因此,对用于算法测试的数据集存在广泛需求。本研究系统梳理了公开可用的网络流量捕获数据集,包括对所含攻击类型的分类、元数据审查、统计与复杂度分析。每个数据集均经过分析,为研究者提供可用于按研究问题选择最优数据集的元数据。这为学术界带来额外效益:研究者可根据具体机器学习目标更便捷地甄选最适配的数据集。