A Hierarchical Gradient Tracking Algorithm for Mitigating Subnet-Drift in Fog Learning Networks

Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.

翻译：联邦学习（FL）在部署于不符合传统星型拓扑架构的雾网络时面临可扩展性挑战。半去中心化联邦学习（SD-FL）为支持设备到设备（D2D）通信的网络提出了一种解决方案，将模型协作分为两个阶段：在底层阶段，利用D2D通信在子网内部进行局部模型聚合；而上层阶段则通过设备-服务器（DS）通信处理全局模型聚合。然而，现有SD-FL方案基于梯度多样性假设，当数据分布趋于异构时，该假设会成为性能瓶颈。本研究提出了半去中心化梯度追踪（SD-GT），这是首个无需此类假设的SD-FL方法，其通过在每一通信层的设备更新中引入追踪项来实现。我们对SD-GT的理论分析揭示了非凸、凸及强凸问题的收敛上界。我们进一步阐明如何利用这些上界开发优化算法，通过调整每个全局训练间隔内的子网采样率与D2D通信轮次，在性能与效率之间实现权衡。后续数值实验表明，在多个数据集上，相较于SD-FL基线方法与梯度追踪基准，SD-GT在训练模型质量与通信成本方面均取得显著提升。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日