FedAgg: Adaptive Federated Learning with Aggregated Gradients

Federated Learning (FL) has emerged as a crucial distributed training paradigm, enabling discrete devices to collaboratively train a shared model under the coordination of a central server, while leveraging their locally stored private data. Nonetheless, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and the incessant information exchange among participants may significantly impede training efficacy, retard the model convergence rate and increase the risk of privacy leakage. To alleviate the divergence between the local and average model parameters and obtain a fast model convergence rate, we propose an adaptive FEDerated learning algorithm called FedAgg by refining the conventional stochastic gradient descent (SGD) methodology with an AGgregated Gradient term at each local training epoch and adaptively adjusting the learning rate based on a penalty term that quantifies the local model deviation. To tackle the challenge of information exchange among clients during local training and design a decentralized adaptive learning rate for each client, we introduce two mean-field terms to approximate the average local parameters and gradients over time. Through rigorous theoretical analysis, we demonstrate the existence and convergence of the mean-field terms and provide a robust upper bound on the convergence of our proposed algorithm. The extensive experimental results on real-world datasets substantiate the superiority of our framework in comparison with existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID datasets.

翻译：联邦学习（FL）已成为一种重要的分布式训练范式，它使得分散的设备能够在中央服务器的协调下，利用本地存储的私有数据协作训练共享模型。然而，异构客户端生成的非独立同分布（Non-IID）数据以及参与者之间持续的信息交换，可能会显著阻碍训练效率、延缓模型收敛速度并增加隐私泄露风险。为缓解本地模型参数与平均参数之间的差异并获得快速的模型收敛速率，我们提出了一种自适应联邦学习算法FedAgg。该算法通过在每一轮本地训练中引入聚合梯度项来改进传统的随机梯度下降（SGD）方法，并基于一个量化本地模型偏差的惩罚项自适应地调整学习率。为应对本地训练期间客户端间信息交换的挑战，并为每个客户端设计去中心化的自适应学习率，我们引入了两个平均场项来近似随时间变化的平均本地参数和梯度。通过严格的理论分析，我们证明了平均场项的存在性与收敛性，并为我们所提算法的收敛性提供了稳健的上界。在真实数据集上的大量实验结果证实，相较于现有的先进联邦学习策略，我们的框架在独立同分布（IID）与非独立同分布（Non-IID）数据集上，对于提升模型性能与加速收敛速率均具有优越性。