Split learning provides a practical paradigm for resource-constrained users to train Large Language Models (LLMs) by offloading computation-intensive layers to a server while keeping raw data local. However, existing privacy-preserving split learning methods still face a difficult trade-off among utility, privacy, efficiency, and stability. Specifically, these methods often suffer from substantial utility degradation, remain vulnerable to advanced data reconstruction attacks, incur prohibitive computational and communication overhead, or exhibit unstable performance across different tasks. In this paper, we propose MIXGUARD, a novel mixup-based privacy-preserving split learning framework for LLMs. MIXGUARD introduces token-level obfuscation, representation-level obfuscation, and adaptive gradient perturbation mechanisms, which operate jointly to preserve useful learning signals while preventing privacy leakage to the server. Technically, MIXGUARD first constructs a lightweight calibration model on a public dataset to refine the approximated target representation, and then applies this model during privacy-preserving fine-tuning on private data. We conduct extensive experiments on four classification tasks and four text generation tasks across multiple LLM families, model sizes, architectures, and fine-tuning strategies. The results show that MIXGUARD preserves model utility comparable to non-split training baselines, consistently achieves stronger privacy protection than existing split learning defense methods against state-of-the-art data reconstruction attacks, and remains robust under adaptive attack settings.
翻译:分割学习为资源受限用户训练大语言模型提供了一种实用范式,通过将计算密集型层卸载至服务器,同时保留原始数据在本地处理。然而,现有隐私保护分割学习方法仍在效用性、隐私性、效率性和稳定性之间面临严峻权衡。具体而言,这些方法常遭受显著的效用性退化、难以抵御高级数据重建攻击、产生高昂的计算与通信开销,或在跨任务场景中表现出不稳定性。本文提出MIXGUARD——一种面向大语言模型的基于混合增强的隐私保护分割学习框架。该框架引入令牌级混淆、表示级混淆与自适应梯度扰动机制,三者协同运作以在保留有效学习信号的同时,防止隐私信息向服务器泄露。技术实现上,MIXGUARD首先在公共数据集上构建轻量级校准模型以精炼目标近似表示,随后在私有数据的隐私保护微调阶段应用该模型。我们基于多组大语言模型系列、模型规模、架构及微调策略,在四项分类任务与四项文本生成任务上开展了全面实验。结果表明,MIXGUARD能保持媲美非分割训练基线的模型效用性,在对抗最先进数据重建攻击时始终优于现有分割学习防御方法,并在自适应攻击场景下保持鲁棒性。