Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
翻译:联邦学习(FL)在雾网络部署时面临可扩展性挑战。半去中心化联邦学习(SD-FL)提出了一种将模型协作分为两阶段的解决方案:在低层阶段,采用设备对设备(D2D)通信进行子网内部局部模型聚合;在高层阶段,通过设备-服务器(DS)通信进行全局模型聚合。然而,现有SD-FL方案基于梯度多样性假设,当数据分布异质性增强时,这些假设将成为性能瓶颈。本文提出半去中心化梯度追踪(SD-GT),这是首个通过在各通信层设备更新中引入追踪项来消除此类假设的SD-FL方法。理论分析表明,在适当选择步长的情况下,SD-GT在非凸和强凸问题上均具有收敛上界。我们利用所得上界开发了一种协同优化算法,可根据性能-效率权衡优化子网采样率和D2D通信轮次。基于多个数据集的数值评估显示,与SD-FL基线方法及梯度追踪方法相比,SD-GT在模型质量与通信成本方面均取得了显著改进。