Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data, posing significant privacy concerns. While machine unlearning techniques aim to remove such data, they predominantly depend on access to the training data. This requirement is often impractical, as training data in real-world deployments is commonly proprietary or inaccessible. To address this limitation, we propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data. Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning via a contrastive mask loss within a low-rank adaptation (LoRA) subspace. Extensive experiments on the AI4Privacy PII-Masking dataset using Pythia models demonstrate that our method effectively removes target PII while maintaining model utility.
翻译:大语言模型(LLMs)展现出强大的能力,但存在记忆训练数据中敏感个人身份信息(PII)的风险,引发严重的隐私担忧。虽然机器遗忘技术旨在移除此类数据,但现有方法主要依赖于对训练数据的访问。这一要求在现实部署中往往不切实际,因为训练数据通常具有专有性或不可访问性。为解决这一局限,我们提出数据无关选择性遗忘(DFSU)——一种无需训练数据即可从大语言模型中移除敏感PII的新型隐私保护框架。该方法首先通过语言模型反演合成伪PII样本,随后为这些合成样本构建词元级隐私掩码,最终在低秩自适应(LoRA)子空间中通过对比掩码损失实现词元级选择性遗忘。基于Pythia模型在AI4Privacy PII-Masking数据集上的大量实验表明,本方法在有效移除目标PII的同时保持了模型性能。