In cross-device federated learning (FL) with millions of mobile clients, only a small subset of clients participate in training in every communication round, and Federated Averaging (FedAvg) is the most popular algorithm in practice. Existing analyses of FedAvg usually assume the participating clients are independently sampled in each round from a uniform distribution, which does not reflect real-world scenarios. This paper introduces a theoretical framework that models client participation in FL as a Markov chain to study optimization convergence when clients have non-uniform and correlated participation across rounds. We apply this framework to analyze a more general and practical pattern: every client must wait a minimum number of $R$ rounds (minimum separation) before re-participating. We theoretically prove and empirically observe that increasing minimum separation reduces the bias induced by intrinsic non-uniformity of client availability in cross-device FL systems. Furthermore, we develop an effective debiasing algorithm for FedAvg that provably converges to the unbiased optimal solution under arbitrary minimum separation and unknown client availability distribution.
翻译:在涉及数百万移动客户端的跨设备联邦学习(FL)中,每个通信轮次仅有一小部分客户端参与训练,而联邦平均(FedAvg)是实践中最流行的算法。现有的FedAvg分析通常假设参与客户端在每轮中是从均匀分布中独立采样的,这并不能反映真实场景。本文引入了一个理论框架,将FL中的客户端参与建模为马尔可夫链,以研究当客户端在跨轮次中具有非均匀且相关的参与模式时的优化收敛性。我们应用此框架分析一种更通用且实际的模式:每个客户端在重新参与之前必须等待至少$R$轮(最小间隔)。我们从理论上证明并通过实验观察到,增加最小间隔可以减少跨设备FL系统中客户端固有可用性的非均匀性所引入的偏差。此外,我们为FedAvg开发了一种有效的去偏算法,该算法在任意最小间隔和未知客户端可用性分布下,可证明地收敛到无偏最优解。