In the Industrial Internet of Things (IoT), a large amount of data will be generated every day. Due to privacy and security issues, it is difficult to collect all these data together to train deep learning models, thus the federated learning, a distributed machine learning paradigm that protects data privacy, has been widely used in IoT. However, in practical federated learning, the data distributions usually have large differences across devices, and the heterogeneity of data will deteriorate the performance of the model. Moreover, federated learning in IoT usually has a large number of devices involved in training, and the limited communication resource of cloud servers become a bottleneck for training. To address the above issues, in this paper, we combine centralized federated learning with decentralized federated learning to design a semi-decentralized cloud-edge-device hierarchical federated learning framework, which can mitigate the impact of data heterogeneity, and can be deployed at lage scale in IoT. To address the effect of data heterogeneity, we use an incremental subgradient optimization algorithm in each ring cluster to improve the generalization ability of the ring cluster models. Our extensive experiments show that our approach can effectively mitigate the impact of data heterogeneity and alleviate the communication bottleneck in cloud servers.
翻译:在工业物联网中,每天都会产生海量数据。由于隐私与安全问题,难以将所有数据集中训练深度学习模型,因此保护数据隐私的分布式机器学习范式——联邦学习——在物联网中得到了广泛应用。然而,实际联邦学习中,不同设备间的数据分布通常存在显著差异,这种数据异质性会降低模型性能。此外,物联网场景下的联邦学习常涉及大量设备参与训练,云服务器有限的通信资源成为训练瓶颈。针对上述问题,本文融合中心化联邦学习与去中心化联邦学习,设计了一种半去中心化的云-边-端层次化联邦学习框架,既能缓解数据异质性的影响,又能支持物联网大规模部署。为应对数据异质性问题,我们采用环簇结构中的增量次梯度优化算法,提升环簇模型的泛化能力。大量实验表明,本方法可有效缓解数据异质性的影响,并减轻云服务器通信瓶颈。