Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients, and a decline in model accuracy under non-iid local data distributions ("client drift"). In this work, we propose and analyze Asynchronous Exact Averaging (AREA), a new stochastic (sub)gradient algorithm that utilizes asynchronous communication to speed up convergence and enhance scalability, and employs client memory to correct the client drift caused by variations in client update frequencies. Moreover, AREA is, to the best of our knowledge, the first method that is guaranteed to converge under arbitrarily long delays, without the use of delay-adaptive stepsizes, and (i) for strongly convex, smooth functions, asymptotically converges to an error neighborhood whose size depends only on the variance of the stochastic gradients used with respect to the number of iterations, and (ii) for convex, non-smooth functions, matches the convergence rate of the centralized stochastic subgradient method up to a constant factor, which depends on the average of the individual client update frequencies instead of their minimum (or maximum). Our numerical results validate our theoretical analysis and indicate AREA outperforms state-of-the-art methods when local data are highly non-iid, especially as the number of clients grows.
翻译:联邦学习(FL)最近被提出,旨在中央服务器的协调下,利用分布在多个地点(“客户端”)的数据安全地训练模型。阻碍联邦学习算法性能的两大挑战是:由滞后客户端导致的训练时间过长,以及在非独立同分布(non-iid)本地数据分布下模型准确性的下降(“客户端漂移”)。本文提出并分析了一种新的随机(次)梯度算法——异步精确平均(AREA),该算法利用异步通信来加速收敛并增强可扩展性,同时采用客户端记忆机制来校正因客户端更新频率差异引起的客户端漂移。此外,据我们所知,AREA是第一种在任意长延迟下保证收敛且无需使用延迟自适应步长的方法,并且(i)对于强凸、光滑函数,渐近收敛于一个误差邻域,其大小仅取决于所用随机梯度方差相对于迭代次数的关系;(ii)对于凸、非光滑函数,其收敛速率与集中式随机次梯度方法相匹配,仅相差一个常数因子,该因子取决于各客户端更新频率的平均值而非其最小值(或最大值)。我们的数值结果验证了理论分析,并表明当本地数据高度非独立同分布时,尤其是在客户端数量增加的情况下,AREA优于现有最先进方法。