Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge. No work tries to address these challenges. To bridge this gap, we propose SplitTF, an adaptive federated split learning system for LLMs fine-tuning. SplitFT enables different clients to set different cut layers according to their computation resources and trained model performance. SplitFT also proposes to reduce the LoRA rank in cutlayer to reduce the communication overhead. In addition to simulating the heterogeneous data in real-world applications for our proposed split federated learning system, we propose a length-based Dirichlet approach to divide the training data into different clients. Extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance based on various popular benchmarks.
翻译:摘要:联邦拆分学习被视作一种有效方法,既能解决经典联邦学习中客户端计算资源受限的问题,又能确保跨数据所有者的分布式模型训练中的隐私安全。然而,当这种训练策略应用于大语言模型微调时,面临若干关键挑战。这些挑战包括:如何根据数据与设备异构性问题自适应地为不同客户端设置切割层,这显著影响系统性能;此外,如何在微调过程中高效降低通信开销是另一难题。目前尚无研究工作尝试解决这些问题。为弥补这一空白,我们提出SplitFT——一种面向大语言模型微调的自适应联邦拆分学习系统。SplitFT允许不同客户端根据其计算资源和训练模型性能设置不同的切割层,同时提出通过降低切割层的LoRA秩来减少通信开销。除在现实场景中模拟异构数据以适配所提拆分联邦学习系统外,我们还提出一种基于长度的狄利克雷方法将训练数据分配至不同客户端。大量实验结果表明,基于多种主流基准测试,我们提出的方法在微调时间效率与模型性能上均优于现有最优方法。