In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.
翻译:在联邦学习中,数据异构性是一个关键挑战。一个直接的解决方案是对客户端数据进行混洗以同化分布。然而,这可能会违反数据访问权限,并且混洗如何以及在何种条件下能加速联邦优化算法的收敛,目前在理论上尚未得到充分理解。本文建立了数据异构性与收敛速率中参数之间的精确可量化对应关系,该关系适用于部分数据跨客户端混洗的场景。我们证明,混洗能以二次方比例降低梯度相异度与混洗百分比之间的关联,从而加速收敛。受理论启发,我们提出一种实用方法,通过对本地生成的合成数据进行混洗来解决数据访问权限问题。实验结果表明,混洗合成数据可大幅提升多种现有联邦学习算法的性能。