To enable training of large artificial intelligence (AI) models at the network edge, split federated learning (SFL) has emerged as a promising approach by distributing computation between edge devices and a server. However, while unstable network environments pose significant challenges to SFL, prior schemes often overlook such an effect by assuming perfect client participation, rendering them impractical for real-world scenarios. In this work, we develop an optimization framework for SFL with unstable client participation. We theoretically derive the first convergence upper bound for SFL with unstable client participation by considering activation uploading failures, gradient downloading failures, and model aggregation failures. Based on the theoretical results, we formulate a joint optimization problem for client sampling and model splitting to minimize the upper bound. We then develop an efficient solution approach to solve the problem optimally. Extensive simulations on EMNIST and CIFAR-10 demonstrate the superiority of our proposed framework compared to existing benchmarks.
翻译:为了在网络边缘训练大规模人工智能(AI)模型,分裂联邦学习(SFL)通过将计算任务分布在边缘设备和服务器之间,已成为一种颇具前景的方法。然而,不稳定的网络环境给SFL带来了重大挑战,现有方案往往忽略此影响,假设客户端完美参与,导致其在真实场景中不实用。本研究针对客户端不稳定参与的SFL,开发了一个优化框架。我们通过考虑激活上传失败、梯度下载失败和模型聚合失败,首次从理论上推导了客户端不稳定参与下SFL的收敛上界。基于理论结果,我们构建了客户端采样与模型分割的联合优化问题以最小化该上界,并开发了一种高效求解方法以最优方式解决问题。在EMNIST和CIFAR-10上的大量仿真结果表明,我们的框架相较于现有基准具有优越性。