Large Language Models (LLMs) have proven powerful, but the risk of privacy leakage remains a significant concern. Traditional privacy-preserving methods, such as Differential Privacy and Homomorphic Encryption, are inadequate for black-box API-only settings, demanding either model transparency or heavy computational resources. We propose Prompt2Forget (P2F), the first framework designed to tackle the LLM local privacy challenge by teaching LLM to forget. The method involves decomposing full questions into smaller segments, generating fabricated answers, and obfuscating the model's memory of the original input. A benchmark dataset was crafted with questions containing privacy-sensitive information from diverse fields. P2F achieves zero-shot generalization, allowing adaptability across a wide range of use cases without manual adjustments. Experimental results indicate P2F's robust capability to obfuscate LLM's memory, attaining a forgetfulness score of around 90\% without any utility loss. This represents an enhancement of up to 63\% when contrasted with the naive direct instruction technique, highlighting P2F's efficacy in mitigating memory retention of sensitive information within LLMs. Our findings establish the first benchmark in the novel field of the LLM forgetting task, representing a meaningful advancement in privacy preservation in the emerging LLM domain.
翻译:大型语言模型(LLM)已展现出强大的能力,但隐私泄露风险仍是一个重大关切。传统的隐私保护方法,如差分隐私和同态加密,无法适用于仅提供黑盒API的场景,因为这些方法要么要求模型透明,要么需要大量计算资源。我们提出Prompt2Forget(P2F),这是首个通过教LLM遗忘来应对其本地隐私挑战的框架。该方法将完整问题分解为更小的片段,生成虚构答案,并混淆模型对原始输入的存储。我们构建了一个包含来自不同领域的隐私敏感问题的基准数据集。P2F实现了零样本泛化能力,无需人工调整即可适应广泛的用例。实验结果表明,P2F在混淆LLM记忆方面具有稳健能力,能在不损失任何效用的前提下实现约90%的遗忘得分。与简单的直接指令方法相比,这一效果提升了高达63%,凸显了P2F在减轻LLM对敏感信息记忆保持方面的有效性。我们的研究成果为LLM遗忘任务这一新兴领域建立了首个基准,标志着在LLM新兴领域隐私保护方面取得了有意义的进展。