Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

In this paper, a dataset of IoT network traffic is presented. Our dataset was generated by utilising the Gotham testbed, an emulated large-scale Internet of Things (IoT) network designed to provide a realistic and heterogeneous environment for network security research. The testbed includes 78 emulated IoT devices operating on various protocols, including MQTT, CoAP, and RTSP. Network traffic was captured in Packet Capture (PCAP) format using tcpdump, and both benign and malicious traffic were recorded. Malicious traffic was generated through scripted attacks, covering a variety of attack types, such as Denial of Service (DoS), Telnet Brute Force, Network Scanning, CoAP Amplification, and various stages of Command and Control (C&C) communication. The data were subsequently processed in Python for feature extraction using the Tshark tool, and the resulting data was converted to Comma Separated Values (CSV) format and labelled. The data repository includes the raw network traffic in PCAP format and the processed labelled data in CSV format. Our dataset was collected in a distributed manner, where network traffic was captured separately for each IoT device at the interface between the IoT gateway and the device. Our dataset was collected in a distributed manner, where network traffic was separately captured for each IoT device at the interface between the IoT gateway and the device. With its diverse traffic patterns and attack scenarios, this dataset provides a valuable resource for developing Intrusion Detection Systems and security mechanisms tailored to complex, large-scale IoT environments. The dataset is publicly available at Zenodo.

翻译：本文提出了一种物联网网络流量数据集。该数据集通过Gotham测试平台生成，该平台是一个模拟的大规模物联网网络，旨在为网络安全研究提供真实且异构的环境。测试平台包含78个模拟物联网设备，运行多种协议，包括MQTT、CoAP和RTSP。网络流量通过tcpdump以数据包捕获格式记录，同时捕获了正常流量和恶意流量。恶意流量通过脚本化攻击生成，涵盖多种攻击类型，例如拒绝服务攻击、Telnet暴力破解、网络扫描、CoAP放大攻击以及命令与控制通信的各个阶段。随后使用Python通过Tshark工具对数据进行特征提取处理，并将处理后的数据转换为逗号分隔值格式并进行标注。数据存储库包含PCAP格式的原始网络流量和CSV格式的已处理标注数据。本数据集采用分布式方式采集，即在物联网网关与设备之间的接口处为每个物联网设备单独捕获网络流量。凭借其多样化的流量模式和攻击场景，该数据集为开发适用于复杂大规模物联网环境的入侵检测系统和安全机制提供了宝贵资源。该数据集已在Zenodo平台公开提供。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日