Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. In Federated Learning, a set of clients jointly perform a machine learning task under the coordination of a server. The FedAvg algorithm is one of the most widely used methods to solve Federated Learning problems. In FedAvg, the learning rate is a constant rather than changing adaptively. The adaptive gradient methods show superior performance over the constant learning rate schedule; however, there is still no general framework to incorporate adaptive gradient methods into the federated setting. In this paper, we propose \textbf{FedDA}, a novel framework for local adaptive gradient methods. The framework adopts a restarted dual averaging technique and is flexible with various gradient estimation methods and adaptive learning rate formulations. In particular, we analyze \textbf{FedDA-MVR}, an instantiation of our framework, and show that it achieves gradient complexity $\tilde{O}(\epsilon^{-1.5})$ and communication complexity $\tilde{O}(\epsilon^{-1})$ for finding a stationary point $\epsilon$. This matches the best known rate for first-order FL algorithms and \textbf{FedDA-MVR} is the first adaptive FL algorithm that achieves this rate. We also perform extensive numerical experiments to verify the efficacy of our method.
翻译:联邦学习(FL)是一种应对大规模分布式数据的新兴学习范式。在联邦学习中,一组客户端在服务器协调下共同执行机器学习任务。FedAvg算法是解决联邦学习问题最广泛使用的方法之一。在FedAvg中,学习率是恒定的而非自适应调整。自适应梯度方法在恒定学习率调度上展现出更优性能;然而,目前仍缺乏将自适应梯度方法融入联邦设置的通用的框架。本文提出\textbf{FedDA},一种新颖的局部自适应梯度方法框架。该框架采用重启对偶平均技术,并灵活兼容多种梯度估计方法与自适应学习率形式。特别地,我们分析了框架实例化\textbf{FedDA-MVR},证明其寻找驻点$\epsilon$时达到梯度复杂度$\tilde{O}(\epsilon^{-1.5})$和通信复杂度$\tilde{O}(\epsilon^{-1})$。这匹配了一阶FL算法的最优已知速率,且\textbf{FedDA-MVR}是首个达到该速率的自适应FL算法。我们还通过大量数值实验验证了该方法有效性。