SplitCom: Communication-efficient Split Federated Fine-tuning of LLMs via Temporal Compression

Federated fine-tuning of on-device large language models (LLMs) mitigates privacy concerns by preventing raw data sharing. However, the intensive computational and memory demands pose significant challenges for resource-constrained edge devices. To overcome these limitations, split federated learning (SFL) emerges as a promising solution that partitions the model into lightweight client-side and compute-intensive server-side sub-models, thus offloading the primary training workload to a powerful server. Nevertheless, high-dimensional activation exchanges in SFL lead to excessive communication overhead. To overcome this, we propose SplitCom, a communication-efficient SFL framework for LLMs that exploits temporal redundancy in activations across consecutive training epochs. Inspired by video compression, the core innovation of our framework lies in selective activation uploading only when a noticeable deviation from previous epochs occurs. To balance communication efficiency and learning performance, we introduce two adaptive threshold control schemes based on 1) bang-bang control or 2) deep deterministic policy gradient (DDPG)-based reinforcement learning. Moreover, we implement dimensionality reduction techniques to alleviate client-side memory requirements. Furthermore, we extend SplitCom to the U-shape architecture, ensuring the server never accesses clients' labels. Extensive simulations and laboratory experiments demonstrate that SplitCom reduces uplink communication costs by up to 98.6\,\% in its standard configuration and total communication costs by up to 95.8\,\% in its U-shape variant without noticeably compromising model performance.

翻译：基于设备端大语言模型（LLM）的联邦微调通过避免原始数据共享来缓解隐私担忧。然而，密集的计算和内存需求对资源受限的边缘设备构成了重大挑战。为克服这些限制，分割联邦学习（SFL）作为一种有前景的解决方案应运而生，它将模型划分为轻量级的客户端子模型和计算密集型的服务器端子模型，从而将主要训练工作负载卸载到强大的服务器上。然而，SFL中高维激活值的交换导致了过度的通信开销。为解决此问题，我们提出了SplitCom，一种面向LLM的通信高效型SFL框架，该框架利用了连续训练周期中激活值的时序冗余性。受视频压缩启发，我们框架的核心创新在于仅当激活值与先前周期出现显著偏差时才进行选择性上传。为平衡通信效率与学习性能，我们引入了两种基于以下原理的自适应阈值控制方案：1）开关控制，或2）基于深度确定性策略梯度（DDPG）的强化学习。此外，我们实施了降维技术以减轻客户端内存需求。更进一步，我们将SplitCom扩展至U型架构，确保服务器永不访问客户端标签。大量仿真与实验室实验表明，SplitCom在其标准配置下可将上行链路通信成本降低高达98.6%，在其U型变体下可将总通信成本降低高达95.8%，而模型性能未出现明显下降。