Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.
翻译:近期研究将差分隐私(DP)应用于适配大型语言模型(LLMs)的敏感场景,提供了理论保障。然而其实际有效性仍不明确,部分源于LLM预训练过程中,即使采用差分隐私措施,预训练数据与适配数据之间的重叠和相互依赖关系仍可能削弱隐私保护效果。为实证分析该问题,我们基于最新攻击手段(如鲁棒成员推断与金丝雀数据提取),系统研究了差分隐私适配下LLM的隐私风险。通过系统性地改变适配数据分布——从与预训练数据的完全重叠,经过同分布(IID)场景,到完全异分布(OOD)样本——我们对隐私风险进行基准测试。此外,我们评估了不同适配方法与隐私预算机制对脆弱性的影响。结果表明,数据分布偏移显著影响隐私脆弱性:即使不存在直接数据重叠,适配数据与预训练分布越接近,理论保障下的实际隐私风险越高。研究发现,LoRA等参数高效微调方法在OOD数据上能实现最高的经验隐私保护。本基准测试揭示了差分隐私LLM适配中保障实际隐私的关键因素,为在敏感场景部署定制化模型提供了可操作指导。展望未来,我们提出一个超越适配隐私的全局隐私评估框架,用于识别和评估LLM全流程(预训练-适配)中系统性的隐私风险。