Adaptive Federated Learning Over the Air

We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms, encompassing the effects of channel fading and interference, for a broad spectrum of nonconvex loss functions. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}( \ln{(T)} /{ T^{ 1 - \frac{1}{\alpha} } } )$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}( 1/T )$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.

翻译：我们提出了一种在空口模型训练框架下的自适应梯度方法联邦版本，特别针对AdaGrad和Adam算法。该方法利用无线信道固有的叠加特性，实现了快速且可扩展的参数聚合。同时，通过根据全局梯度更新动态调整步长，增强了模型训练过程的鲁棒性。我们推导了训练算法的收敛速率，该速率涵盖了信道衰落和干扰对广泛非凸损失函数的影响。分析表明，基于AdaGrad的算法以$\mathcal{O}( \ln{(T)} /{ T^{ 1 - \frac{1}{\alpha} } } )$的速率收敛到稳定点，其中$\alpha$表示电磁干扰的尾指数。这一结果表明，干扰分布的重尾程度对训练效率起着关键作用：尾部越重，算法收敛越慢。相比之下，类Adam算法以$\mathcal{O}( 1/T )$速率收敛，展示了其在加速模型训练过程中的优势。我们进行了大量实验，验证了我们的理论发现，并确认了所提出的联邦自适应梯度方法的实际有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日