Existing FL-based approaches are based on the unrealistic assumption that the data on the client-side is fully annotated with ground truths. Furthermore, it is a great challenge how to improve the training efficiency while ensuring the detection accuracy in the highly heterogeneous and resource-constrained IoT networks. Meanwhile, the communication cost between clients and the server is also a problem that can not be ignored. Therefore, in this paper, we propose a Federated Semi-Supervised and Semi-Asynchronous (FedS3A) learning for anomaly detection in IoT networks. First, we consider a more realistic assumption that labeled data is only available at the server, and pseudo-labeling is utilized to implement federated semi-supervised learning, in which a dynamic weight of supervised learning is exploited to balance the supervised learning at the server and unsupervised learning at clients. Then, we propose a semi-asynchronous model update and staleness tolerant distribution scheme to achieve a trade-off between the round efficiency and detection accuracy. Meanwhile, the staleness of local models and the participation frequency of clients are considered to adjust their contributions to the global model. In addition, a group-based aggregation function is proposed to deal with the non-IID distribution of the data. Finally, the difference transmission based on the sparse matrix is adopted to reduce the communication cost. Extensive experimental results show that FedS3A can achieve greater than 98% accuracy even when the data is non-IID and is superior to the classic FL-based algorithms in terms of both detection performance and round efficiency, achieving a win-win situation. Meanwhile, FedS3A successfully reduces the communication cost by higher than 50%.
翻译:现有基于联邦学习(FL)的方法基于不切实际的假设,即客户端侧数据均具有完整真实标签标注。此外,在高度异构且资源受限的物联网网络中,如何提升训练效率同时确保检测精度是一大挑战。同时,客户端与服务器之间的通信成本也是不容忽视的问题。因此,本文提出一种面向物联网网络异常检测的联邦半监督与半异步学习(FedS3A)方法。首先,我们采用更贴近实际的假设,即仅服务器拥有标注数据,并利用伪标签技术实现联邦半监督学习,其中引入有监督学习的动态权重以平衡服务器的有监督学习与客户端的无监督学习。然后,我们提出半异步模型更新与陈旧容忍分发方案,以在轮次效率与检测精度之间取得权衡。同时,考虑本地模型的陈旧度与客户端参与频率,调整其对全局模型的贡献。此外,提出基于分组的聚合函数以应对数据的非独立同分布特性。最后,采用基于稀疏矩阵的差分传输以降低通信成本。大量实验结果表明,即使数据呈现非独立同分布,FedS3A仍能达到超过98%的准确率,且在检测性能与轮次效率方面均优于经典联邦学习算法,实现双赢。同时,FedS3A成功将通信成本降低超过50%。