We present a comprehensive study on applying machine learning to detect distributed Denial of service (DDoS) attacks using large-scale Internet of Things (IoT) systems. While prior works and existing DDoS attacks have largely focused on individual nodes transmitting packets at a high volume, we investigate more sophisticated futuristic attacks that use large numbers of IoT devices and camouflage their attack by having each node transmit at a volume typical of benign traffic. We introduce new correlation-aware architectures that take into account the correlation of traffic across IoT nodes, and we also compare the effectiveness of centralized and distributed detection models. We extensively analyze the proposed architectures by evaluating five different neural network models trained on a dataset derived from a 4060-node real-world IoT system. We observe that long short-term memory (LSTM) and a transformer-based model, in conjunction with the architectures that use correlation information of the IoT nodes, provide higher performance (in terms of F1 score and binary accuracy) than the other models and architectures, especially when the attacker camouflages itself by following benign traffic distribution on each transmitting node. For instance, by using the LSTM model, the distributed correlation-aware architecture gives 81% F1 score for the attacker that camouflages their attack with benign traffic as compared to 35% for the architecture that does not use correlation information. We also investigate the performance of heuristics for selecting a subset of nodes to share their data for correlation-aware architectures to meet resource constraints.
翻译:本文对利用机器学习在大规模物联网系统中检测分布式拒绝服务(DDoS)攻击进行了全面研究。尽管先前的工作及现有DDoS攻击主要集中于单个节点以高流量发送数据包,但我们研究了更复杂的新型攻击形式:这些攻击使用大量物联网设备,并通过让每个节点以良性流量典型速率发送数据包来伪装攻击行为。我们引入了新的相关感知架构,该架构考虑了物联网节点间流量的相关性,并比较了集中式与分布式检测模型的有效性。我们通过评估五种不同神经网络模型(基于4060节点真实物联网系统数据集训练)对所提架构进行了深入分析。研究发现,结合利用物联网节点相关性信息的架构,长短期记忆网络(LSTM)与基于Transformer的模型在F1分数和二分类准确率方面均优于其他模型和架构,尤其当攻击者通过在每个传输节点上遵循良性流量分布进行伪装时。例如,使用LSTM模型时,针对伪装攻击的分布式相关感知架构可获得81%的F1分数,而未使用相关性信息的架构仅为35%。此外,我们探讨了相关感知架构中为满足资源约束而选择部分节点共享数据的启发式方法的性能表现。