Federated learning (FL) is an effective paradigm for distributed environments such as the Internet of Things (IoT), where data from diverse devices with varying functionalities remains localized while contributing to a shared global model. By eliminating the need to transmit raw data, FL inherently preserves privacy. However, the heterogeneous nature of IoT data, stemming from differences in device capabilities, data formats, and communication constraints, poses significant challenges to maintaining both global model performance and privacy. In the context of IoT-based anomaly detection, unsupervised FL offers a promising means to identify abnormal behavior without centralized data aggregation. Nevertheless, feature heterogeneity across devices complicates model training and optimization, hindering effective implementation. In this study we propose an efficient unsupervised FL framework that enhances anomaly detection by leveraging shared features from two distinct IoT datasets: one focused on anomaly detection and the other on device identification, while preserving dataset-specific features. To improve transparency and interpretability, we employ explainable AI techniques, such as SHAP, to identify key features influencing local model decisions. Experiments conducted on real-world IoT datasets demonstrate that the proposed method significantly outperforms conventional FL approaches in anomaly detection accuracy. This work underscores the potential of using shared features from complementary datasets to optimize unsupervised federated learning and achieve superior anomaly detection results in decentralized IoT environments.
翻译:联邦学习(FL)是适用于物联网(IoT)等分布式环境的有效范式,其中来自功能各异的不同设备的数据在本地保留,同时为共享的全局模型做出贡献。通过消除传输原始数据的需要,FL本质上保护了隐私。然而,由于设备能力、数据格式和通信约束的差异导致的物联网数据异构性,对保持全局模型性能和隐私都构成了重大挑战。在基于物联网的异常检测背景下,无监督联邦学习提供了一种无需集中数据聚合即可识别异常行为的有前景的方法。然而,设备间的特征异构性使模型训练和优化复杂化,阻碍了有效实施。在本研究中,我们提出了一种高效的无监督联邦学习框架,通过利用来自两个不同物联网数据集的共享特征来增强异常检测:一个专注于异常检测,另一个专注于设备识别,同时保留数据集特定的特征。为了提高透明度和可解释性,我们采用可解释的人工智能技术,例如SHAP,以识别影响局部模型决策的关键特征。在真实世界的物联网数据集上进行的实验表明,所提出的方法在异常检测准确率上显著优于传统的联邦学习方法。这项工作强调了利用来自互补数据集的共享特征来优化无监督联邦学习,并在去中心化的物联网环境中实现卓越异常检测结果的潜力。