Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
翻译:联邦学习(FL)在雾网络(不遵循FL传统星型拓扑架构)中实施时面临可扩展性挑战。半去中心化联邦学习(SD-FL)为支持设备到设备(D2D)通信的网络提出了一种解决方案,将模型协作分为两个阶段:在底层阶段,利用D2D通信在子网内部进行局部模型聚合;而上层阶段则处理设备-服务器(DS)通信以实现全局模型聚合。然而,现有SD-FL方案基于梯度多样性假设,当数据分布变得更加异构时,这些假设会成为性能瓶颈。本研究开发了半去中心化梯度跟踪(SD-GT)方法,这是首个通过在每个通信层的设备更新中引入跟踪项来消除此类假设需求的SD-FL方法。我们对SD-GT的理论分析揭示了非凸、凸及强凸问题的收敛上界。我们证明了这些上界如何支持开发一种优化算法,该算法通过调整每个全局训练间隔内的子网采样率和D2D通信轮数,在性能与效率之间实现权衡。后续数值评估表明,在多个数据集上,相较于SD-FL和梯度跟踪的基线方法,SD-GT在训练模型质量和通信成本方面均取得显著提升。