Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.

翻译：近期研究将差分隐私（DP）应用于适配大型语言模型（LLMs）的敏感场景，提供了理论保障。然而其实际有效性仍不明确，部分源于LLM预训练过程中，即使采用差分隐私措施，预训练数据与适配数据之间的重叠和相互依赖关系仍可能削弱隐私保护效果。为实证分析该问题，我们基于最新攻击手段（如鲁棒成员推断与金丝雀数据提取），系统研究了差分隐私适配下LLM的隐私风险。通过系统性地改变适配数据分布——从与预训练数据的完全重叠，经过同分布（IID）场景，到完全异分布（OOD）样本——我们对隐私风险进行基准测试。此外，我们评估了不同适配方法与隐私预算机制对脆弱性的影响。结果表明，数据分布偏移显著影响隐私脆弱性：即使不存在直接数据重叠，适配数据与预训练分布越接近，理论保障下的实际隐私风险越高。研究发现，LoRA等参数高效微调方法在OOD数据上能实现最高的经验隐私保护。本基准测试揭示了差分隐私LLM适配中保障实际隐私的关键因素，为在敏感场景部署定制化模型提供了可操作指导。展望未来，我们提出一个超越适配隐私的全局隐私评估框架，用于识别和评估LLM全流程（预训练-适配）中系统性的隐私风险。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型中隐性与显性偏见的综合研究

专知会员服务

17+阅读 · 2025年11月25日

综述：面向移动端大语言模型的隐私与安全

专知会员服务

19+阅读 · 2025年9月7日

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日

大型语言模型代理的安全与隐私综述

专知会员服务

30+阅读 · 2024年8月5日