SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently has been proposed to facilitate collaborative LLM fine-tuning on distributed private data, where multiple data owners collaboratively fine-tune a shared LLM without sharing raw data. However, the staggering model size of LLMs imposes heavy computing and communication burdens on clients, posing significant barriers to the democratization of the FL LLM fine-tuning paradigm. To address this issue, split learning (SL) has emerged as a promising solution by offloading the primary training workload to a server via model partitioning while exchanging activation/activation's gradients with smaller data sizes rather than the entire LLM. Unfortunately, research on the SL LLM fine-tuning paradigm is still in its nascent stage. To fill this gap, in this paper, we propose the first SL LLM fine-tuning framework, named SplitLoRA. SplitLoRA is built on the split federated learning (SFL) framework, amalgamating the advantages of parallel training from FL and model splitting from SL and thus greatly enhancing the training efficiency. It is worth noting that SplitLoRA is the inaugural open-source benchmark for SL LLM fine-tuning, providing a foundation for research efforts dedicated to advancing SL LLM fine-tuning. Extensive simulations validate that SplitLoRA achieves target accuracy in significantly less time than state-of-the-art LLM fine-tuning frameworks, demonstrating the superior training performance of SplitLoRA. The project page is available at https://fduinc.github.io/splitlora/.

翻译：大语言模型在处理高复杂度模型和大规模数据集方面的可扩展性，已在关键领域取得巨大成功。尽管迫切需要为LLM获取更多训练数据，但一个令人担忧的现实是高质量公共数据集将在几年内耗尽。鉴于此，联邦学习LLM微调范式最近被提出，以促进在分布式私有数据上的协作式LLM微调，允许多个数据所有者在不共享原始数据的情况下协作微调共享LLM。然而，LLM庞大的模型规模给客户端带来了沉重的计算和通信负担，对FL LLM微调范式的普及构成了重大障碍。为解决这一问题，分割学习作为一种有前景的解决方案应运而生，它通过模型分割将主要训练负载卸载到服务器，同时交换数据量较小的激活/激活梯度而非整个LLM。遗憾的是，关于SL LLM微调范式的研究仍处于起步阶段。为填补这一空白，本文提出了首个SL LLM微调框架——SplitLoRA。SplitLoRA构建于分割联邦学习框架之上，融合了FL的并行训练优势和SL的模型分割优势，从而极大提升了训练效率。值得注意的是，SplitLoRA是首个开源的SL LLM微调基准，为推进SL LLM微调的研究工作奠定了基础。大量仿真实验验证，SplitLoRA在显著少于当前最先进LLM微调框架的时间内达到目标精度，展现了其卓越的训练性能。项目页面详见https://fduinc.github.io/splitlora/。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日