Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.
翻译:联邦学习是一种新兴的分布式机器学习方法,可使大量客户端在不交换本地数据的情况下训练模型。通信时间开销是联邦学习中的关键瓶颈,特别是在训练大规模深度神经网络时。诸如FedAvg和FedAdam等通信高效的联邦学习方法在不同客户端之间共享相同的学习率,但在数据异构场景下效率较低。为最大化优化方法的性能,核心挑战在于如何调整学习率而不影响收敛性。本文提出一种异构局部变体的AMSGrad算法,命名为FedLALR,其中每个客户端基于本地历史梯度平方项与同步后的学习率自适应调整其学习率。理论分析表明,我们提出的客户端自适应学习率调度策略能够收敛,并实现与客户端数量成正比的线性加速,这使联邦优化具备良好的可扩展性。我们还将该方法与多种通信高效的联邦优化方法进行了实验对比。在计算机视觉任务和自然语言处理任务上的大量实验结果表明,所提出的FedLALR方法具有显著效果,且与理论分析结论一致。