Zero-Shot Machine Unlearning

Modern privacy regulations grant citizens the right to be forgotten by products, services and companies. In case of machine learning (ML) applications, this necessitates deletion of data not only from storage archives but also from ML models. Due to an increasing need for regulatory compliance required for ML applications, machine unlearning is becoming an emerging research problem. The right to be forgotten requests come in the form of removal of a certain set or class of data from the already trained ML model. Practical considerations preclude retraining of the model from scratch after discarding the deleted data. The few existing studies use either the whole training data, or a subset of training data, or some metadata stored during training to update the model weights for unlearning. However, in many cases, no data related to the training process or training samples may be accessible for the unlearning purpose. We therefore ask the question: is it possible to achieve unlearning with zero training samples? In this paper, we introduce the novel problem of zero-shot machine unlearning that caters for the extreme but practical scenario where zero original data samples are available for use. We then propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer. These methods remove the information of the forget data from the model while maintaining the model efficacy on the retain data. The zero-shot approach offers good protection against the model inversion attacks and membership inference attacks. We introduce a new evaluation metric, Anamnesis Index (AIN) to effectively measure the quality of the unlearning method. The experiments show promising results for unlearning in deep learning models on benchmark vision data-sets. The source code is available here: https://github.com/ayu987/zero-shot-unlearning

翻译：现代隐私法规赋予了公民向产品、服务和公司要求“被遗忘权”的权利。在机器学习应用中，这意味着不仅需要从存储档案中删除数据，还必须从ML模型中删除数据。由于ML应用对法规合规性的需求日益增长，机器遗忘正成为一个新兴的研究问题。“被遗忘权”请求通常表现为要求从已训练的ML模型中移除特定数据集或数据类别。实际考量排除了丢弃被删除数据后从头重新训练模型的可能性。现有少数研究要么使用全部训练数据、要么使用部分训练数据，要么使用训练过程中存储的元数据来更新模型权重以实现遗忘。然而，在许多情况下，与训练过程或训练样本相关的数据可能无法用于遗忘目的。因此，我们提出疑问：是否可能在不使用任何训练样本的情况下实现遗忘？本文引入零样本机器遗忘这一全新问题，该问题针对的是零原始数据样本可用的极端但实际场景。接着，我们提出了两种基于（a）误差最小化-最大化噪声和（b）门控知识迁移的零样本机器遗忘解决方案。这些方法在从模型中移除遗忘数据信息的同时，保持模型对保留数据的有效性。零样本方法能有效防御模型反演攻击和成员推理攻击。我们引入了一种新的评估指标——遗忘指数（AIN），用于有效衡量遗忘方法的质量。实验表明，本文方法在基准视觉数据集上的深度学习模型遗忘中取得了有前景的结果。源代码请见：https://github.com/ayu987/zero-shot-unlearning

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

17+阅读 · 2022年3月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日