In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.
翻译:在联邦学习中,数据异构性是一个关键挑战。一种直接解决方案是对客户端数据进行洗牌以均匀化数据分布。然而,这可能会侵犯数据访问权限,且目前尚未从理论上充分理解洗牌如何以及何时能加速联邦优化算法的收敛。本文建立了数据异构性与收敛速率参数之间的精确可量化对应关系,此时部分数据在客户端之间进行洗牌。我们证明,洗牌可相对于洗牌比例二次降低梯度差异性,从而加速收敛。受理论启发,我们提出了一种实用方法,通过对本地生成的合成数据进行洗牌来解决数据访问权限问题。实验结果表明,合成数据洗牌显著提升了多种现有联邦学习算法的性能。