Systematic review, analysis, and characterisation of malicious industrial network traffic datasets for aiding Machine Learning algorithm performance testing

Networking · Performer · Machine Learning · 数据集 · Learning ·

2024 年 5 月 8 日

翻译：工业恶意网络流量数据集的系统性综述、分析与特性刻画：助力机器学习算法性能测试

Martin Dobler,Michael Hellwig,Nuno Lopes,Ken Oakley,Mike Winterburn

from arxiv, 28 pages, preprint submitted to Engineering Applications of Artificial Intelligence (Elsevier)

The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.

翻译：工业物联网作为运营技术的补充技术，实现了标准化数据访问与流程可见性的新高度。信息技术、运营技术与工业物联网的融合也产生了必须加以管控的新网络安全漏洞与风险。人工智能正成为监控OT/IIoT网络恶意活动的强效工具，并成为高度活跃的研究领域。AI研究者将先进的机器学习和深度学习技术应用于网络流量中异常或恶意行为的检测，通常采用从IoT/IIoT/OT网络流量捕获中提取的数据集来评估所提方法的性能。因此，对用于算法测试的数据集存在广泛需求。本研究系统梳理了公开可用的网络流量捕获数据集，包括对所含攻击类型的分类、元数据审查、统计与复杂度分析。每个数据集均经过分析，为研究者提供可用于按研究问题选择最优数据集的元数据。这为学术界带来额外效益：研究者可根据具体机器学习目标更便捷地甄选最适配的数据集。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日