The vast increase of Internet of Things (IoT) technologies and the ever-evolving attack vectors have increased cyber-security risks dramatically. A common approach to implementing AI-based Intrusion Detection systems (IDSs) in distributed IoT systems is in a centralised manner. However, this approach may violate data privacy and prohibit IDS scalability. Therefore, intrusion detection solutions in IoT ecosystems need to move towards a decentralised direction. Federated Learning (FL) has attracted significant interest in recent years due to its ability to perform collaborative learning while preserving data confidentiality and locality. Nevertheless, most FL-based IDS for IoT systems are designed under unrealistic data distribution conditions. To that end, we design an experiment representative of the real world and evaluate the performance of an FL-based IDS. For our experiments, we rely on TON-IoT, a realistic IoT network traffic dataset, associating each IP address with a single FL client. Additionally, we explore pre-training and investigate various aggregation methods to mitigate the impact of data heterogeneity. Lastly, we benchmark our approach against a centralised solution. The comparison shows that the heterogeneous nature of the data has a considerable negative impact on the model's performance when trained in a distributed manner. However, in the case of a pre-trained initial global FL model, we demonstrate a performance improvement of over 20% (F1-score) compared to a randomly initiated global model.
翻译:物联网技术的迅猛发展以及不断演变的攻击向量极大地增加了网络安全风险。在分布式物联网系统中实施基于人工智能的入侵检测系统(IDS)的常见方法是采用集中式方式。然而,这种方法可能违反数据隐私并限制IDS的可扩展性。因此,物联网生态系统中的入侵检测解决方案需要向去中心化方向发展。联邦学习(FL)近年来因能够在保护数据机密性和局部性的同时执行协作学习而吸引了大量关注。然而,大多数基于FL的物联网系统入侵检测方案是在不切实际的数据分布条件下设计的。为此,我们设计了一个代表真实世界的实验,并评估了基于FL的入侵检测系统的性能。在实验中,我们依赖于TON-IoT——一个真实的物联网网络流量数据集,将每个IP地址与单个FL客户端关联。此外,我们探索了预训练并研究了各种聚合方法以减轻数据异质性的影响。最后,我们将我们的方法与集中式解决方案进行了基准测试。对比结果表明,当以分布式方式训练时,数据的异质性对模型性能产生了显著的负面影响。然而,在预训练初始全局FL模型的情况下,与随机初始化的全局模型相比,我们展示了超过20%(F1分数)的性能提升。