Federated Learning (FL) is a novel distributed machine learning approach to leverage data from Internet of Things (IoT) devices while maintaining data privacy. However, the current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. To address the statistical imbalances in FL, we propose a clustered data sharing framework which spares the partial data from cluster heads to credible associates through device-to-device (D2D) communication. Moreover, aiming at diluting the data skew on nodes, we formulate the joint clustering and data sharing problem based on the privacy-preserving constrained graph. To tackle the serious coupling of decisions on the graph, we devise a distribution-based adaptive clustering algorithm (DACA) basing on three deductive cluster-forming conditions, which ensures the maximum yield of data sharing. The experiments show that the proposed framework facilitates FL on non-IID datasets with better convergence and model accuracy under a limited communication environment.
翻译:联邦学习(FL)是一种新型分布式机器学习方法,旨在利用物联网(IoT)设备数据的同时保护数据隐私。然而,当前FL算法面临着非独立同分布(non-IID)数据的挑战,这会导致通信成本高昂和模型精度下降。为应对FL中的统计不平衡问题,我们提出了一种分簇数据共享框架,该框架通过设备到设备(D2D)通信,将簇头节点的部分数据分配给可信关联节点。此外,为缓解节点上的数据偏斜问题,我们基于隐私保护约束图构建了联合聚类与数据共享问题。为解决图上决策的强耦合性,我们设计了基于三种推导性簇形成条件的分布式自适应聚类算法(DACA),该算法能够确保数据共享的最大收益。实验表明,所提框架在有限通信环境下能够促进非IID数据集上的联邦学习,实现更优的收敛性能和模型精度。