Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer. In this work, we propose a novel momentum-based algorithm via utilizing the global gradient descent and locally adaptive amended optimizer to tackle these difficulties. Specifically, we incorporate a locally amended technique to the adaptive optimizer, named Federated Local ADaptive Amended optimizer (\textit{FedLADA}), which estimates the global average offset in the previous communication round and corrects the local offset through a momentum-like term to further improve the empirical training speed and mitigate the heterogeneous over-fitting. Theoretically, we establish the convergence rate of \textit{FedLADA} with a linear speedup property on the non-convex case under the partial participation settings. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficacy of our proposed \textit{FedLADA}, which could greatly reduce the communication rounds and achieves higher accuracy than several baselines.
翻译:自适应优化在分布式学习中取得了显著成功,但将自适应优化器扩展到联邦学习(FL)时会面临严重效率问题,包括:(i)全局自适应优化器中因梯度估计不准确导致的收敛震荡;(ii)局部自适应优化器中因过拟合加剧的客户端漂移。本文提出一种基于动量的新型算法,通过结合全局梯度下降与局部自适应修正优化器来解决上述难题。具体而言,我们将局部修正技术融入自适应优化器,提出联邦局部自适应修正优化器(FedLADA)。该算法通过动量项估计上一通信轮次的全局平均偏移并修正局部偏移,从而加速经验训练过程并缓解异构过拟合。理论上,我们证明了在部分参与设定下,FedLADA在非凸情况下的收敛速度具有线性加速特性。此外,我们在真实数据集上开展大量实验,验证了所提FedLADA算法的有效性——该算法可大幅减少通信轮次,且精度优于多个基线方法。