Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients and a decrease in training accuracy induced by non-iid local distributions ("client drift"). In this work we propose and analyze AREA, a new stochastic (sub)gradient algorithm that is robust to client drift and utilizes asynchronous communication to speed up convergence in the presence of stragglers. Moreover, AREA is, to the best of our knowledge, the first method that is both guaranteed to converge under arbitrarily long delays, and converges to an error neighborhood whose size depends only on the variance of the stochastic (sub)gradients used and thus is independent of both the heterogeneity between the local datasets and the length of client delays, without the use of delay-adaptive stepsizes. Our numerical results confirm our theoretical analysis and suggest that AREA outperforms state-of-the-art methods when local data are highly non-iid.
翻译:联邦学习(FL)近期被提出,用于在中央服务器的协调下,安全地训练位于多个位置(称为“客户端”)的数据所持有的模型。阻碍FL算法性能的两个主要挑战是由落后客户端导致的长时间训练,以及由非独立同分布本地数据分布引起的训练精度下降(即“客户端漂移”)。本文提出并分析了AREA算法,这是一种新的随机(次)梯度方法,能够抵御客户端漂移,并利用异步通信在存在落后客户端的情况下加速收敛。此外,据我们所知,AREA是首个在任意长延迟下均保证收敛,且收敛至误差邻域的方法;该误差邻域的大小仅取决于所用随机(次)梯度的方差,因此既不受本地数据集异质性影响,也不受客户端延迟长度影响,且无需使用延迟自适应步长。我们的数值结果证实了理论分析,并表明当本地数据高度非独立同分布时,AREA优于现有最先进方法。