Optimal Routing for Federated Learning over Dynamic Satellite Networks: Tractable or Not?

Federated learning (FL) is a key paradigm for distributed model learning across decentralized data sources. Communication in each FL round typically consists of two phases: (i) distributing the global model from a server to clients, and (ii) collecting updated local models from clients to the server for aggregation. This paper focuses on a type of FL where communication between a client and the server is relay-based over dynamic networks, making routing optimization essential. A typical scenario is in-orbit FL, where satellites act as clients and communicate with a server (which can be a satellite, ground station, or aerial platform) via multi-hop inter-satellite links. This paper presents a comprehensive tractability analysis of routing optimization for in-orbit FL under different settings. For global model distribution, these include the number of models, the objective function, and routing schemes (unicast versus multicast, and splittable versus unsplittable flow). For local model collection, the settings consider the number of models, client selection, and flow splittability. For each case, we rigorously prove whether the global optimum is obtainable in polynomial time or the problem is NP-hard. Together, our analysis draws clear boundaries between tractable and intractable regimes for a broad spectrum of routing problems for in-orbit FL. For tractable cases, the derived efficient algorithms are directly applicable in practice. For intractable cases, we provide fundamental insights into their inherent complexity. These contributions fill a critical yet unexplored research gap, laying a foundation for principled routing design, evaluation, and deployment in satellite-based FL or similar distributed learning systems.

翻译：联邦学习（FL）是一种跨分散数据源进行分布式模型训练的关键范式。每轮FL中的通信通常包含两个阶段：（i）将全局模型从服务器分发至客户端，（ii）将更新后的本地模型从客户端收集至服务器进行聚合。本文重点关注一种特殊的FL场景，其中客户端与服务器之间的通信需通过动态网络中的中继节点完成，这使得路由优化至关重要。典型场景为在轨联邦学习，卫星作为客户端通过多跳星间链路与服务器（可以是卫星、地面站或空中平台）通信。本文对不同设置下在轨FL的路由优化问题进行了全面的可解性分析。对于全局模型分发阶段，分析维度包括模型数量、目标函数及路由方案（单播与多播、可分割与不可分割流）。对于本地模型收集阶段，分析维度涵盖模型数量、客户端选择及流可分割性。针对每种情况，我们严格证明了全局最优解是否可在多项式时间内获得，或问题是否属于NP难问题。通过综合分析，我们为在轨FL中一系列广泛的路由问题划清了可解与不可解区域的明确边界。对于可解情况，所得高效算法可直接应用于实际场景。对于不可解情况，我们提供了对其固有复杂性的根本性见解。这些贡献填补了一个关键且尚未探索的研究空白，为基于卫星的联邦学习或类似分布式学习系统的路由设计、评估与部署奠定了理论基础。