Private data holds promise for improving LLMs due to its high quality, but its scattered distribution across data silos and the high computational demands of LLMs limit their deployment in federated environments. To address this, the transformer-based federated split models are proposed, which offload most model parameters to the server (or distributed clients) while retaining only a small portion on the client to ensure data privacy. Despite this design, they still face three challenges: 1) Peer-to-peer key encryption struggles to secure transmitted vectors effectively; 2) The auto-regressive nature of LLMs means that federated split learning can only train and infer sequentially, causing high communication overhead; 3) Fixed partition points lack adaptability to downstream tasks. In this paper, we introduce FedSEA-LLaMA, a Secure, Efficient, and Adaptive Federated splitting framework based on LLaMA2. First, we inject Gaussian noise into forward-pass hidden states to enable secure end-to-end vector transmission. Second, we employ attention-mask compression and KV cache collaboration to reduce communication costs, accelerating training and inference. Third, we allow users to dynamically adjust the partition points for input/output blocks based on specific task requirements. Experiments on natural language understanding, summarization, and conversational QA tasks show that FedSEA-LLaMA maintains performance comparable to centralized LLaMA2 and achieves up to 8x speedups in training and inference. Further analysis of privacy attacks and different partition points also demonstrates the effectiveness of FedSEA-LLaMA in security and adaptability.
翻译:私有数据因其高质量而有望提升大语言模型的性能,但其分散在各数据孤岛中的分布特性以及大语言模型的高计算需求,限制了其在联邦环境中的部署。为此,研究者提出了基于Transformer的联邦分割模型,将大部分模型参数卸载至服务器(或分布式客户端),而仅在客户端保留一小部分以确保数据隐私。尽管采用了这种设计,现有方法仍面临三大挑战:1)点对点密钥加密难以有效保护传输向量的安全性;2)大语言模型的自回归特性导致联邦分割学习只能顺序进行训练和推理,带来高昂的通信开销;3)固定的分割点缺乏对下游任务的适应性。本文提出FedSEA-LLaMA,一个基于LLaMA2的安全、高效、自适应的联邦分割框架。首先,我们在前向传播的隐藏状态中注入高斯噪声,以实现安全的端到端向量传输。其次,我们采用注意力掩码压缩与KV缓存协作机制来降低通信成本,从而加速训练与推理过程。第三,我们允许用户根据具体任务需求动态调整输入/输出模块的分割点。在自然语言理解、文本摘要和对话式问答任务上的实验表明,FedSEA-LLaMA在保持与集中式LLaMA2相当性能的同时,实现了最高达8倍的训练与推理加速。对隐私攻击和不同分割点的进一步分析也验证了FedSEA-LLaMA在安全性与适应性方面的有效性。