Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning

Large Language Models are trained on extensive datasets that often contain sensitive, human-generated information, raising significant concerns about privacy breaches. While certified unlearning approaches offer strong privacy guarantees, they rely on restrictive model assumptions that are not applicable to LLMs. As a result, various unlearning heuristics have been proposed, with the associated privacy risks assessed only empirically. The standard evaluation pipelines typically randomly select data for removal from the training set, apply unlearning techniques, and use membership inference attacks to compare the unlearned models against models retrained without the to-be-unlearned data. However, since every data point is subject to the right to be forgotten, unlearning should be considered in the worst-case scenario from the privacy perspective. Prior work shows that data outliers may exhibit higher memorization effects. Intuitively, they are harder to be unlearn and thus the privacy risk of unlearning them is underestimated in the current evaluation. In this paper, we leverage minority data to identify such a critical flaw in previously widely adopted evaluations. We substantiate this claim through carefully designed experiments, including unlearning canaries related to minority groups, inspired by privacy auditing literature. Using personally identifiable information as a representative minority identifier, we demonstrate that minority groups experience at least 20% more privacy leakage in most cases across six unlearning approaches, three MIAs, three benchmark datasets, and two LLMs of different scales. Given that the right to be forgotten should be upheld for every individual, we advocate for a more rigorous evaluation of LLM unlearning methods. Our minority-aware evaluation framework represents an initial step toward ensuring more equitable assessments of LLM unlearning efficacy.

翻译：大型语言模型在包含敏感人类生成信息的海量数据集上进行训练，这引发了关于隐私泄露的重大关切。虽然经过认证的遗忘方法提供了强有力的隐私保证，但它们依赖于对大型语言模型不适用的严格模型假设。因此，学界提出了多种启发式遗忘方法，其相关的隐私风险仅通过实证方式评估。标准评估流程通常从训练集中随机选择待删除数据，应用遗忘技术，并使用成员推理攻击将遗忘后的模型与未包含待遗忘数据重新训练的模型进行比较。然而，由于每个数据点都享有被遗忘权，从隐私角度应在最坏情况下考量遗忘效果。先前研究表明，数据异常值可能表现出更强的记忆效应。直观而言，这些数据更难被遗忘，因此在当前评估中其遗忘过程的隐私风险被低估。本文利用少数群体数据揭示了先前广泛采用的评估方法中存在的关键缺陷。我们通过精心设计的实验证实了这一论断，包括受隐私审计文献启发、针对少数群体的遗忘探针测试。以个人可识别信息作为代表性少数群体标识符，我们在六种遗忘方法、三种成员推理攻击、三个基准数据集和两种不同规模的大型语言模型上证明：在大多数情况下，少数群体遭受的隐私泄露至少高出20%。鉴于每个个体都应享有被遗忘权，我们主张对大型语言模型遗忘方法进行更严格的评估。本文提出的少数群体感知评估框架，是朝着确保更公平评估大型语言模型遗忘效能迈出的初步步骤。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日