Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
翻译:联邦学习(FL)在雾网络部署中面临可扩展性挑战。半去中心化联邦学习(SD-FL)提出一种解决方案,将模型协同划分为两个阶段:在底层阶段,采用设备到设备(D2D)通信实现子网内局部模型聚合;顶层阶段则通过设备到服务器(DS)通信完成全局模型聚合。然而现有SD-FL方案基于梯度多样性假设,该假设在数据分布日趋异构时成为性能瓶颈。本文提出半去中心化梯度追踪(SD-GT),这是首个通过将追踪项融入各通信层设备更新来消除此类假设的SD-FL方法。对SD-GT的分析性刻画揭示了其在非凸与强凸问题中采用适当步长时的收敛上界。基于这些收敛界,我们开发了共优化算法,依据性能-效率权衡优化子网采样率与D2D通信轮次。后续数值评估表明,在多个数据集上,SD-GT相较于SD-FL基准方案与梯度追踪方法,在训练模型质量与通信成本方面均获得显著提升。