In recent years, machine learning-based anomaly detection (AD) has become an important measure against security threats from Internet of Things (IoT) networks. Machine learning (ML) models for network traffic AD require datasets to be trained, evaluated and compared. Due to the necessity of realistic and up-to-date representation of IoT security threats, new datasets need to be constantly generated to train relevant AD models. Since most traffic generation setups are developed considering only the author's use, replication of traffic generation becomes an additional challenge to the creation and maintenance of useful datasets. In this work, we propose GothX, a flexible traffic generator to create both legitimate and malicious traffic for IoT datasets. As a fork of Gotham Testbed, GothX is developed with five requirements: 1)easy configuration of network topology, 2) customization of traffic parameters, 3) automatic execution of legitimate and attack scenarios, 4) IoT network heterogeneity (the current iteration supports MQTT, Kafka and SINETStream services), and 5) automatic labeling of generated datasets. GothX is validated by two use cases: a) re-generation and enrichment of traffic from the IoT dataset MQTTset,and b) automatic execution of a new realistic scenario including the exploitation of a CVE specific to the Kafka-MQTT network topology and leading to a DDoS attack. We also contribute with two datasets containing mixed traffic, one made from the enriched MQTTset traffic and another from the attack scenario. We evaluated the scalability of GothX (450 IoT sensors in a single machine), the replication of the use cases and the validity of the generated datasets, confirming the ability of GothX to improve the current state-of-the-art of network traffic generation.
翻译:近年来,基于机器学习的异常检测已成为应对物联网网络安全威胁的重要措施。用于网络流量异常检测的机器学习模型需要数据集进行训练、评估和比较。由于必须真实且最新地呈现物联网安全威胁,需要持续生成新的数据集来训练相关异常检测模型。由于大多数流量生成设置仅基于作者自身用途开发,流量生成的复现成为创建和维护有效数据集的额外挑战。本文提出GothX,一种灵活的流量生成器,可为物联网数据集创建合法与恶意流量。作为Gotham Testbed的分支项目,GothX基于五项需求开发:1)网络拓扑的简易配置,2)流量参数的可定制性,3)合法与攻击场景的自动执行,4)物联网网络异构性(当前版本支持MQTT、Kafka和SINETStream服务),5)生成数据集的自动标注。通过两个用例验证GothX:a)对物联网数据集MQTTset流量进行再生与增强,b)自动执行包含针对Kafka-MQTT网络拓扑的CVE漏洞利用并导致DDoS攻击的新型现实场景。我们还贡献了两个包含混合流量的数据集:一个基于增强的MQTTset流量,另一个来自攻击场景。我们评估了GothX的可扩展性(单机支持450个物联网传感器)、用例的复现能力及生成数据集的有效性,证实了GothX能够改进当前网络流量生成的技术水平。