Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections. Its performance suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting. In this paper, we propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems. Concretely, FedSpeed applies the prox-correction term on the current local updates to efficiently reduce the biases introduced by the prox-term, a necessary regularizer to maintain the strong local consistency. Furthermore, FedSpeed merges the vanilla stochastic gradient with a perturbation computed from an extra gradient ascent step in the neighborhood, thereby alleviating the issue of local over-fitting. Our theoretical analysis indicates that the convergence rate is related to both the communication rounds $T$ and local intervals $K$ with a upper bound $\small \mathcal{O}(1/T)$ if setting a proper local interval. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficiency of our proposed FedSpeed, which performs significantly faster and achieves the state-of-the-art (SOTA) performance on the general FL experimental settings than several baselines including FedAvg, FedProx, FedCM, FedAdam, SCAFFOLD, FedDyn, FedADMM, etc.
翻译:联邦学习是一种新兴的分布式机器学习框架,它通过大量具备数据隐私保护的本地设备联合训练全局模型。然而,其性能受到本地不一致最优解带来的非消失偏差以及本地过拟合导致的客户端剧烈漂移的制约。本文提出一种新颖且实用的方法——FedSpeed,以缓解上述问题带来的负面影响。具体而言,FedSpeed 对当前本地更新应用近端校正项,有效降低近端项(一种维持强本地一致性的必要正则化器)引入的偏差。此外,FedSpeed 将原始随机梯度与相邻区域中额外梯度上升步骤计算的扰动相结合,从而缓解本地过拟合问题。理论分析表明,若设置合适的本地间隔,收敛率与通信轮次 $T$ 和本地间隔 $K$ 均相关,且上界为 $\small \mathcal{O}(1/T)$。我们还在真实数据集上进行了大量实验,以验证所提出 FedSpeed 的高效性。与包括 FedAvg、FedProx、FedCM、FedAdam、SCAFFOLD、FedDyn、FedADMM 等多种基线方法相比,FedSpeed 在通用联邦学习实验设置中收敛速度显著更快,并取得了最先进的性能。