Large Language Models (LLMs) are powerful tools with extensive applications, but their tendency to memorize private information raises significant concerns as private data leakage can easily happen. In this paper, we introduce Private Association Editing (PAE), a novel defense approach for private data leakage. PAE is designed to effectively remove Personally Identifiable Information (PII) without retraining the model. Our approach consists of a four-step procedure: detecting memorized PII, applying PAE cards to mitigate memorization of private data, verifying resilience to targeted data extraction (TDE) attacks, and ensuring consistency in the post-edit LLMs. The versatility and efficiency of PAE, which allows for batch modifications, significantly enhance data privacy in LLMs. Experimental results demonstrate the effectiveness of PAE in mitigating private data leakage. We believe PAE will serve as a critical tool in the ongoing effort to protect data privacy in LLMs, encouraging the development of safer models for real-world applications.
翻译:大型语言模型(LLMs)是功能强大的工具,具有广泛的应用前景,但其倾向于记忆私有信息的特性引发了严重担忧,因为私有数据泄露极易发生。本文提出了一种新颖的私有数据泄露防御方法——私有关联编辑(PAE)。PAE旨在有效移除个人可识别信息(PII),而无需重新训练模型。我们的方法包含一个四步流程:检测被记忆的PII、应用PAE卡片以减轻对私有数据的记忆、验证对目标数据提取(TDE)攻击的抵御能力,以及确保编辑后LLMs的一致性。PAE支持批量修改的通用性和高效性,显著增强了LLMs的数据隐私性。实验结果证明了PAE在减轻私有数据泄露方面的有效性。我们相信,PAE将成为持续保护LLMs数据隐私工作中的关键工具,并鼓励开发适用于实际应用场景的更安全模型。