Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Machine unlearning has become a promising solution for fulfilling the "right to be forgotten", under which individuals can request the deletion of their data from machine learning models. However, existing studies of machine unlearning mainly focus on the efficacy and efficiency of unlearning methods, while neglecting the investigation of the privacy vulnerability during the unlearning process. With two versions of a model available to an adversary, that is, the original model and the unlearned model, machine unlearning opens up a new attack surface. In this paper, we conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of the unlearned data. Specifically, under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample by only accessing the original and unlearned model. The effectiveness of the proposed unlearning inversion attacks is evaluated through extensive experiments on benchmark datasets across various model architectures and on both exact and approximate representative unlearning approaches. The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data. As such, we identify three possible defenses that help to mitigate the proposed attacks, while at the cost of reducing the utility of the unlearned model. The study in this paper uncovers an underexplored gap between machine unlearning and the privacy of unlearned data, highlighting the need for the careful design of mechanisms for implementing unlearning without leaking the information of the unlearned data.

翻译：机器遗忘已成为实现“被遗忘权”的一种有前景的解决方案，允许个人请求从机器学习模型中删除其数据。然而，现有关于机器遗忘的研究主要关注遗忘方法的有效性和效率，而忽视了遗忘过程中隐私脆弱性的探究。当对手能够同时获取原始模型和遗忘模型这两个版本的模型时，机器遗忘便开辟了一个新的攻击面。本文首次探究机器遗忘在多大程度上可能泄露被遗忘数据的机密内容。具体而言，在机器学习即服务（MLaaS）的设定下，我们提出了遗忘反演攻击，该攻击仅需访问原始模型和遗忘模型即可揭示被遗忘样本的特征和标签信息。通过在多种模型架构的基准数据集上，针对精确遗忘和近似遗忘的代表性方法进行大量实验，我们评估了所提遗忘反演攻击的有效性。实验结果表明，该攻击能够揭示被遗忘数据的敏感信息。基于此，我们提出了三种可能的防御措施，以减轻上述攻击，但代价是降低了遗忘模型的有用性。本文的研究揭示了机器遗忘与被遗忘数据隐私之间尚未被充分探索的鸿沟，强调了在设计实现遗忘的机制时需要谨慎，以避免泄露被遗忘数据的信息。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日