Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of parameter transmission, causing a degradation in the performance of the FL system or even preventing its convergence. This letter proposes a dual-segment clustering (DSC) strategy, which first clusters the clients according to the heterogeneous communication conditions and then performs a second clustering by the sample size and label distribution, so as to solve the problem of data and communication heterogeneity. Experimental results show that the DSC strategy proposed in this letter can improve the convergence rate of FL, and has superiority on accuracy in a heterogeneous environment compared with the classical algorithm of cluster.
翻译:联邦学习(FL)是一种高效且通信负载低的分布式机器学习范式,仅需传输网络参数或梯度。然而,非独立同分布(Non-IID)数据特性对该范式产生了负面影响。此外,通信质量的异质性会显著影响参数传输的准确性,导致FL系统性能下降甚至无法收敛。本文提出了一种双段聚类(DSC)策略,首先根据异构通信条件对客户端进行聚类,然后依据样本量与标签分布进行第二次聚类,以解决数据与通信异质性问题。实验结果表明,本文提出的DSC策略能够提升FL的收敛速度,并在异构环境下相比经典聚类算法具有更高的精度优势。