Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm for FL over wireless networks that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probability. Based on the bound, we analytically establish the relationship between the total learning time and sampling probability with an adaptive bandwidth allocation scheme, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Our solution reveals the impact of system and statistical heterogeneity parameters on the optimal client sampling design. Moreover, our solution shows that as the number of sampled clients increases, the total convergence time first decreases and then increases because a larger sampling number reduces the number of rounds for convergence but results in a longer expected time per-round due to limited wireless bandwidth. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes.
翻译:联邦学习算法通常在参与者数量大且服务器通信带宽有限时,每轮仅采样部分客户端(部分参与)。近期关于联邦学习收敛性分析的工作侧重于无偏客户端采样(例如均匀随机采样),但由于系统异构性和统计异构性较高,此类方法在时钟时间上的收敛速度较慢。本文旨在设计一种面向无线网络联邦学习的自适应客户端采样算法,以同时解决系统异构性和统计异构性,最小化时钟收敛时间。我们推导了任意客户端采样概率下联邦学习算法的新的可处理收敛界。基于该收敛界,我们通过自适应带宽分配方案,解析建立了总学习时间与采样概率之间的关系,这导致了一个非凸优化问题。我们设计了一种高效算法来学习收敛界中的未知参数,并开发了一种低复杂度算法近似求解该非凸问题。我们的解决方案揭示了系统异构性和统计异构性参数对最优客户端采样设计的影响。此外,研究表明:随着采样客户端数量的增加,总收敛时间先减少后增加——因为更大的采样数量可减少收敛所需的轮数,但由于无线带宽有限,每轮期望时间会延长。来自硬件原型和仿真实验的结果表明,与若干基线采样方案相比,我们所提出的采样方案显著降低了收敛时间。