Federated Learning (FL) has emerged as a pivotal paradigm within distributed model training, facilitating collaboration among multiple devices to refine a shared model, harnessing their respective datasets as orchestrated by a central server, while ensuring the localization of private data. Nonetheless, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and the incessant information exchange among participants may markedly impede training efficacy and retard the convergence rate. In this paper, we refine the conventional stochastic gradient descent (SGD) methodology by introducing aggregated gradients at each local training epoch and propose an adaptive learning rate iterative algorithm that concerns the divergence between local and average parameters. To surmount the obstacle that acquiring other clients' local information, we introduce the mean-field approach by leveraging two mean-field terms to approximately estimate the average local parameters and gradients over time in a manner that precludes the need for local information exchange among clients and design the decentralized adaptive learning rate for each client. Through meticulous theoretical analysis, we provide a robust convergence guarantee for our proposed algorithm and ensure its wide applicability. Our numerical experiments substantiate the superiority of our framework in comparison with existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID data distributions.
翻译:联邦学习(FL)已成为分布式模型训练中的关键范式,它通过中央服务器的协调,使得多个设备能够利用各自的数据集协同优化共享模型,同时确保私有数据的本地化。然而,异构客户端产生的非独立同分布(Non-IID)数据以及参与者之间的持续信息交换可能会显著降低训练效率并减慢收敛速度。在本文中,我们通过在每个本地训练周期引入聚合梯度来改进传统的随机梯度下降(SGD)方法,并提出了一种自适应学习率迭代算法,该算法关注本地参数与平均参数之间的差异。为了克服获取其他客户端本地信息的障碍,我们引入了平均场方法,通过两个平均场项近似估计跨时间步的平均本地参数和梯度,从而避免客户端之间的本地信息交换,并为每个客户端设计了去中心化的自适应学习率。通过细致的理论分析,我们为所提算法提供了稳健的收敛性保证,并确保了其广泛的适用性。我们的数值实验验证了与现有先进FL策略相比,本框架在独立同分布(IID)和非独立同分布(Non-IID)数据分布下提升模型性能并加速收敛速度方面的优越性。