The rapid advancement of large AI models imposes stringent demands on data volume and computational resources. Federated learning, though designed to exploit distributed data and computational resources, faces data shortage from limited network coverage and computational constraints from edge devices. To address these issues, both the mixture-of-experts (MoE) and satellite-terrestrial network (STN) provide promising solutions, offering lightweight computation overhead and broad coverage, respectively. However, the satellite-ground relative motion results in intermittent connectivity, hindering conventional federated learning that relies on model synchronization across devices. To leverage the coverage of STN while preserving training efficiency, we propose EMS-FL, an expert-driven model splitting and federated learning method. EMS-FL assigns each device cluster only the experts highly correlated to their local data. Through non-overlapping expert assignments, asynchronous local learning is further proposed, where each device cluster trains its assigned experts consecutively and only uploads local parameters to the satellite during connected phases for aggregation and model updates. Consequently, EMS-FL effectively reduces the training overhead and achieves both faster convergence and higher accuracy compared with conventional federated learning. Rigorous convergence analysis is provided to theoretically characterize the learning performance. Furthermore, comprehensive experiments are conducted using public datasets and large models, validating the superiority of EMS-FL.
翻译:大型人工智能模型的快速发展对数据量和计算资源提出了严格要求。联邦学习虽旨在利用分布式数据与计算资源,却面临网络覆盖有限导致的数据短缺以及边缘设备带来的计算约束。为解决这些问题,专家混合模型与星地网络分别提供了轻量化计算开销与广域覆盖的可行方案。然而,星地相对运动导致间歇性连接,阻碍了依赖设备间模型同步的传统联邦学习。为在保持训练效率的同时利用星地网络的覆盖优势,本文提出EMS-FL——一种专家驱动的模型分割与联邦学习方法。EMS-FL为每个设备集群仅分配与其本地数据高度相关的专家。通过非重叠的专家分配机制,进一步引入异步本地学习策略:各设备集群连续训练其分配的专家,仅在连接阶段将本地参数上传至卫星进行聚合与模型更新。因此,相较于传统联邦学习,EMS-FL显著降低了训练开销,同时实现了更快的收敛速度与更高的准确率。本文通过严格的收敛性分析从理论上刻画了学习性能。此外,基于公开数据集与大型模型的综合实验验证了EMS-FL的优越性。