Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two prominent algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for ``client drift'' in its local updates. Various methods have been proposed to enhance the convergence of these two algorithms, but they either make impractical adjustments to the algorithmic structure or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is novel and fairly surprising as existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.
翻译:联邦学习是大规模机器学习的一种强大范式,但由于网络连接不可靠、通信缓慢以及各客户端间的数据异质性显著,它面临着重大挑战。FedAvg和SCAFFOLD是应对这些挑战的两种重要算法。具体而言,FedAvg在与中央服务器通信前执行多次本地更新,而SCAFFOLD则维护每个客户端上的控制变量以补偿其本地更新中的“客户端漂移”。已有多种方法被提议来增强这两种算法的收敛性,但它们要么对算法结构进行了不切实际的调整,要么依赖于有限数据异质性的假设。本文探讨了利用动量来提升FedAvg和SCAFFOLD的性能。当所有客户端参与训练过程时,我们证明即使使用恒定的本地学习率,引入动量也能使FedAvg在无需依赖有限数据异质性假设的情况下收敛。这一结果新颖且相当令人惊讶,因为现有的FedAvg分析即使在递减本地学习率下也需要有限的数据异质性假设。在部分客户端参与的情况下,我们表明动量使SCAFFOLD能够在不施加任何额外假设的前提下实现可证明的更快的收敛。此外,我们利用动量开发了FedAvg和SCAFFOLD的新方差缩减扩展版本,这些版本展现出最先进的收敛速率。我们的实验结果支持所有理论发现。