Large language models (LLMs) are increasingly deployed in privacy-sensitive domains, where users must balance the risk of data exposure through external APIs against the high computational cost of local deployment. Split learning has therefore emerged as a promising paradigm for LLM fine-tuning and inference under limited local resources. However, it introduces new privacy risks. Prior work primarily studies leakage of private input prompts, typically via inversion attacks on intermediate representations, while the potential for sensitive information leakage through generative response outputs remains largely unexplored. In this work, we unveil novel vulnerabilities of Split-LLM by presenting Patched Model Inversion with Dual-Sided Initialization (PIDI), a two-stage attack that simultaneously targets both private input prompts and output responses in Split-LLM settings. It combines dual-sided initialization with a patched inversion strategy to tackle long sequences, substantially outperforming prior inversion methods. To counter threats from both sides, we further propose the Adapter-based DualGuard with Mutual Information Defense (ADMI), which integrates an adapter-based local warmup strategy and mutual information regularization to provide a strong empirical privacy protection with minimal impact on task performance. Extensive experiments across diverse tasks and models demonstrate that ADMI effectively defends against PIDI and other state-of-the-art inversion attacks. Our code is publicly available at https://github.com/FLAIR-THU/VFLAIR-LLM.
翻译:大语言模型日益部署于隐私敏感领域,用户需在通过外部API暴露数据风险与本地部署高昂计算成本之间寻求平衡。因此,分割学习成为本地资源受限场景下大语言模型微调与推理的有前景范式,但同时也引入了新的隐私风险。现有研究主要关注私有输入提示的泄露,通常通过中间表示的逆向攻击实现,而通过生成式响应输出泄露敏感信息的可能性仍鲜有探讨。本工作通过提出"补丁化模型逆向攻击与双向初始化"(PIDI),揭示了分割大语言模型的新漏洞:该两阶段攻击同时针对分割语言模型中的私有输入提示和输出响应,结合双向初始化与补丁化逆向策略处理长序列,显著优于现有逆向方法。为抵御双向威胁,我们进一步提出"基于适配器的双向守护与互信息防御"(ADMI),整合基于适配器的本地预热策略与互信息正则化,在最小化任务性能影响的前提下实现强实证隐私保护。跨多种任务与模型的广泛实验表明,ADMI能有效防御PIDI及其他最先进的逆向攻击。我们的代码公开于https://github.com/FLAIR-THU/VFLAIR-LLM。