The Internet of Things (IoT) is growing rapidly and so the need of ensuring protection against cybersecurity attacks to IoT devices. In this scenario, Intrusion Detection Systems (IDSs) play a crucial role and data-driven IDSs based on machine learning (ML) have recently attracted more and more interest by the research community. While conventional ML-based IDSs are based on a centralized architecture where IoT devices share their data with a central server for model training, we propose a novel approach that is based on federated learning (FL). However, conventional FL is ineffective in the considered scenario, due to the high statistical heterogeneity of data collected by IoT devices. To overcome this limitation, we propose a three-tier FL-based architecture where IoT devices are clustered together based on their statistical properties. Clustering decisions are taken by means of a novel entropy-based strategy, which helps improve model training performance. We tested our solution on the CIC-ToN-IoT dataset: our clustering strategy increases intrusion detection performance with respect to a conventional FL approach up to +17% in terms of F1-score, along with a significant reduction of the number of training rounds.
翻译:物联网(IoT)正在快速发展,因此确保物联网设备免受网络安全攻击的需求日益迫切。在此背景下,入侵检测系统(IDS)发挥着至关重要的作用,而基于机器学习(ML)的数据驱动型IDS近年来日益受到研究界的关注。尽管传统的基于ML的IDS采用集中式架构,即物联网设备将其数据共享给中央服务器进行模型训练,但我们提出了一种基于联邦学习(FL)的新方法。然而,由于物联网设备收集的数据具有高度统计异质性,传统FL在所考虑的场景中效果不佳。为克服这一局限,我们提出了一种三层FL架构,其中物联网设备根据其统计特性进行聚类。聚类决策通过一种新颖的基于熵的策略做出,这有助于提升模型训练性能。我们在CIC-ToN-IoT数据集上测试了我们的解决方案:与传统的FL方法相比,我们的聚类策略在F1分数上将入侵检测性能提高了最高17%,同时显著减少了训练轮数。