Split Federated Learning (SFL) enables collaborative training between resource-constrained edge devices and a compute-rich server. Communication overhead is a central issue in SFL and can be mitigated with auxiliary networks. Yet, the fundamental client-side computation challenge remains, as back-propagation requires substantial memory and computation costs, severely limiting the scale of models that edge devices can support. To enable more resource-efficient client computation and reduce the client-server communication, we propose HERON-SFL, a novel hybrid optimization framework that integrates zeroth-order (ZO) optimization for local client training while retaining first-order (FO) optimization on the server. With the assistance of auxiliary networks, ZO updates enable clients to approximate local gradients using perturbed forward-only evaluations per step, eliminating memory-intensive activation caching and avoiding explicit gradient computation in the traditional training process. Leveraging the low effective rank assumption, we theoretically prove that HERON-SFL's convergence rate is independent of model dimensionality, addressing a key scalability concern common to ZO algorithms. Empirically, on ResNet training and language model (LM) fine-tuning tasks, HERON-SFL matches benchmark accuracy while reducing client peak memory by up to 64% and client-side compute cost by up to 33% per step, substantially expanding the range of models that can be trained or adapted on resource-limited devices.
翻译:拆分联邦学习(SFL)支持资源受限的边缘设备与计算资源丰富的服务器之间进行协同训练。通信开销是SFL的核心问题,可通过辅助网络加以缓解。然而,客户端侧的根本计算挑战依然存在,因为反向传播需要大量的内存和计算成本,严重限制了边缘设备所能支持的模型规模。为了实现更高效的客户端计算并减少客户端与服务器之间的通信,我们提出了HERON-SFL,一种新颖的混合优化框架,该框架在本地客户端训练中集成零阶(ZO)优化,同时在服务器端保留一阶(FO)优化。借助辅助网络的协助,ZO更新使客户端能够通过每步的扰动前向评估来近似局部梯度,从而消除了传统训练过程中需要缓存内存密集型激活值以及显式计算梯度的需求。基于低有效秩假设,我们从理论上证明了HERON-SFL的收敛率与模型维度无关,解决了ZO算法普遍面临的关键可扩展性问题。在ResNet训练和语言模型(LM)微调任务的实证中,HERON-SFL在达到基准精度的同时,将客户端的峰值内存降低了高达64%,每步客户端计算成本降低了高达33%,显著扩展了可在资源受限设备上训练或适配的模型范围。