FedAgg: Adaptive Federated Learning with Aggregated Gradients

Federated Learning (FL) has become an emerging norm for distributed model training, which enables multiple devices cooperatively to train a shared model utilizing their own datasets scheduled by a central server while keeping private data localized. However, during the training process, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and frequent communication across participants may significantly influence the training performance, slow down the convergent rate, and increase communication consumption. In this paper, we ameliorate the standard stochastic gradient descent approach by introducing the aggregated gradients at each local update epoch and propose an adaptive learning rate iterative algorithm that further takes the deviation between the local parameter and global parameter into account. The aforementioned adaptive learning rate design mechanism requires local information of all clients, which is challenging as there is no communication during the local update epochs. To obtain a decentralized adaptive learning rate for each client, we introduce the mean-field approach by utilizing two mean-field terms to estimate the average local parameters and gradients respectively without exchanging clients' local information with each other over time. Through theoretical analysis, we prove that our method can provide the convergence guarantee for model training and derive a convergent upper bound for the client drifting term. Extensive numerical results show that our proposed framework is superior to the state-of-the-art FL schemes in both model accuracy and convergent rate on real-world datasets with IID and Non-IID data distribution.

翻译：联邦学习（FL）已成为分布式模型训练的新范式，它允许多个设备在中央服务器调度下，利用各自数据集协作训练共享模型，同时保持私有数据的本地化。然而，在训练过程中，异构客户端上产生的非独立同分布（Non-IID）数据以及参与者之间频繁的通信会显著影响训练性能、降低收敛速度并增加通信消耗。本文通过在每个本地更新轮次中引入聚合梯度，改进了标准随机梯度下降方法，并提出了一个自适应学习率迭代算法，该算法进一步考虑了本地参数与全局参数之间的偏差。上述自适应学习率设计机制需要所有客户端的本地信息，而在本地更新轮次期间不存在通信，这使得实现具有挑战性。为了为每个客户端获得分散式自适应学习率，我们引入了均场方法，利用两个均场项分别估计平均本地参数和梯度，而无需随时间在各客户端之间交换本地信息。通过理论分析，我们证明了该方法能为模型训练提供收敛保证，并推导出客户端漂移项的收敛上界。大量数值结果表明，在独立同分布（IID）和非独立同分布（Non-IID）数据分布的真实数据集上，我们提出的框架在模型精度和收敛速度方面均优于现有最先进的联邦学习方案。

相关内容

自适应学习

关注 10

自适应学习，也被称为自适应教学，是使用计算机算法来协调与学习者的互动，并提供定制学习资源和学习活动来解决每个学习者的独特需求的教育方法。在专业的学习情境，个人可以“试验出”一些训练方式，以确保教学内容的更新。根据学生的学习需要，计算机生成适应其特点的教育材料，包括他们对问题的回答和完成的任务和经验。该技术涵盖了各个研究领域和它们的衍生，包括计算机科学、人工智能、心理测验、教育学、心理学和脑科学。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日