Machine unlearning has become a promising solution for fulfilling the "right to be forgotten", under which individuals can request the deletion of their data from machine learning models. However, existing studies of machine unlearning mainly focus on the efficacy and efficiency of unlearning methods, while neglecting the investigation of the privacy vulnerability during the unlearning process. With two versions of a model available to an adversary, that is, the original model and the unlearned model, machine unlearning opens up a new attack surface. In this paper, we conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of the unlearned data. Specifically, under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample by only accessing the original and unlearned model. The effectiveness of the proposed unlearning inversion attacks is evaluated through extensive experiments on benchmark datasets across various model architectures and on both exact and approximate representative unlearning approaches. The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data. As such, we identify three possible defenses that help to mitigate the proposed attacks, while at the cost of reducing the utility of the unlearned model. The study in this paper uncovers an underexplored gap between machine unlearning and the privacy of unlearned data, highlighting the need for the careful design of mechanisms for implementing unlearning without leaking the information of the unlearned data.
翻译:机器遗忘已成为实现“被遗忘权”的一种有前景的解决方案,允许个人请求从机器学习模型中删除其数据。然而,现有关于机器遗忘的研究主要关注遗忘方法的有效性和效率,而忽视了遗忘过程中隐私脆弱性的探究。当对手能够同时获取原始模型和遗忘模型这两个版本的模型时,机器遗忘便开辟了一个新的攻击面。本文首次探究机器遗忘在多大程度上可能泄露被遗忘数据的机密内容。具体而言,在机器学习即服务(MLaaS)的设定下,我们提出了遗忘反演攻击,该攻击仅需访问原始模型和遗忘模型即可揭示被遗忘样本的特征和标签信息。通过在多种模型架构的基准数据集上,针对精确遗忘和近似遗忘的代表性方法进行大量实验,我们评估了所提遗忘反演攻击的有效性。实验结果表明,该攻击能够揭示被遗忘数据的敏感信息。基于此,我们提出了三种可能的防御措施,以减轻上述攻击,但代价是降低了遗忘模型的有用性。本文的研究揭示了机器遗忘与被遗忘数据隐私之间尚未被充分探索的鸿沟,强调了在设计实现遗忘的机制时需要谨慎,以避免泄露被遗忘数据的信息。