The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to centralized Intrusion Detection Systems (IDS), standard approaches struggle to generalize across diverse device behaviors and typically fail to utilize the vast amounts of unlabeled data present in realistic edge environments. To bridge these gaps, we propose CLAD, a holistic framework that seamlessly incorporates Clustered Federated Learning (CFL) with a novel Dual-Mode Micro-Architecture ($\text{DM}^2\text{A}$). This unified approach simultaneously tackles the two primary bottlenecks of IoT security: device heterogeneity and label scarcity. The $\text{DM}^2\text{A}$ component features a shared encoder followed by two branches, enabling joint unsupervised anomaly detection and supervised attack classification; this allows the framework to harvest intelligence from both labeled and unlabeled clients. Concurrently, the clustering component dynamically groups devices with congruent traffic patterns, preventing global model divergence. By carefully combining these elements, CLAD ensures that no data is discarded and distinct operational patterns are preserved. Extensive evaluations demonstrate that this integrated approach significantly outperforms state-of-the-art baselines, achieving a 30% relative improvement in detection performance in scenarios with 80% unlabeled clients, with only half the communication cost.
翻译:物联网(IoT)和工业物联网(IIoT)的快速发展创造了庞大且异构的攻击面,对传统网络安全机制构成了挑战。尽管联邦学习(FL)为集中式入侵检测系统(IDS)提供了一种隐私保护的替代方案,但标准方法难以泛化到多样化的设备行为,并且通常无法利用真实边缘环境中存在的大量未标记数据。为弥合这些差距,我们提出了CLAD,一个整体框架,它将聚类联邦学习(CFL)与一种新颖的双模式微架构($\text{DM}^2\text{A}$)无缝结合。这种统一方法同时解决了物联网安全的两大瓶颈:设备异构性和标签稀缺性。$\text{DM}^2\text{A}$组件具有一个共享编码器,后接两个分支,能够实现联合无监督异常检测和有监督攻击分类;这使得该框架能够从标记和未标记的客户端中获取信息。同时,聚类组件会动态地对具有一致流量模式的设备进行分组,防止全局模型发散。通过精心组合这些元素,CLAD确保不会丢弃任何数据,并保留不同的操作模式。大量评估表明,这种集成方法显著优于最先进的基线,在80%的客户端未标记场景中,检测性能相对提升了30%,而通信成本仅为一半。