Federated Learning (FL) has become an emerging norm for distributed model training, which enables multiple devices cooperatively to train a shared model utilizing their own datasets scheduled by a central server while keeping private data localized. However, during the training process, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and frequent communication across participants may significantly influence the training performance, slow down the convergent rate, and increase communication consumption. In this paper, we ameliorate the standard stochastic gradient descent approach by introducing the aggregated gradients at each local update epoch and propose an adaptive learning rate iterative algorithm that further takes the deviation between the local parameter and global parameter into account. The aforementioned adaptive learning rate design mechanism requires local information of all clients, which is challenging as there is no communication during the local update epochs. To obtain a decentralized adaptive learning rate for each client, we introduce the mean-field approach by utilizing two mean-field terms to estimate the average local parameters and gradients respectively without exchanging clients' local information with each other over time. Through theoretical analysis, we prove that our method can provide the convergence guarantee for model training and derive a convergent upper bound for the client drifting term. Extensive numerical results show that our proposed framework is superior to the state-of-the-art FL schemes in both model accuracy and convergent rate on real-world datasets with IID and Non-IID data distribution.
翻译:联邦学习(FL)已成为分布式模型训练的新范式,它允许多个设备在中央服务器调度下,利用各自数据集协作训练共享模型,同时保持私有数据的本地化。然而,在训练过程中,异构客户端上产生的非独立同分布(Non-IID)数据以及参与者之间频繁的通信会显著影响训练性能、降低收敛速度并增加通信消耗。本文通过在每个本地更新轮次中引入聚合梯度,改进了标准随机梯度下降方法,并提出了一个自适应学习率迭代算法,该算法进一步考虑了本地参数与全局参数之间的偏差。上述自适应学习率设计机制需要所有客户端的本地信息,而在本地更新轮次期间不存在通信,这使得实现具有挑战性。为了为每个客户端获得分散式自适应学习率,我们引入了均场方法,利用两个均场项分别估计平均本地参数和梯度,而无需随时间在各客户端之间交换本地信息。通过理论分析,我们证明了该方法能为模型训练提供收敛保证,并推导出客户端漂移项的收敛上界。大量数值结果表明,在独立同分布(IID)和非独立同分布(Non-IID)数据分布的真实数据集上,我们提出的框架在模型精度和收敛速度方面均优于现有最先进的联邦学习方案。