Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
翻译:联邦学习(FL)在部署于不符合传统星型拓扑架构的雾网络时面临可扩展性挑战。半去中心化联邦学习(SD-FL)为支持设备到设备(D2D)通信的网络提出了一种解决方案,将模型协作分为两个阶段:在底层阶段,利用D2D通信在子网内部进行局部模型聚合;而上层阶段则通过设备-服务器(DS)通信处理全局模型聚合。然而,现有SD-FL方案基于梯度多样性假设,当数据分布趋于异构时,该假设会成为性能瓶颈。本研究提出了半去中心化梯度追踪(SD-GT),这是首个无需此类假设的SD-FL方法,其通过在每一通信层的设备更新中引入追踪项来实现。我们对SD-GT的理论分析揭示了非凸、凸及强凸问题的收敛上界。我们进一步阐明如何利用这些上界开发优化算法,通过调整每个全局训练间隔内的子网采样率与D2D通信轮次,在性能与效率之间实现权衡。后续数值实验表明,在多个数据集上,相较于SD-FL基线方法与梯度追踪基准,SD-GT在训练模型质量与通信成本方面均取得显著提升。