As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.
翻译:随着“被遗忘权”在全球范围内被立法,许多研究试图设计遗忘机制,以保护用户离开机器学习服务平台时的隐私。具体而言,机器遗忘是指使已训练模型移除训练数据集中被擦除子集贡献的过程。本综述旨在系统分类广泛的机器遗忘方法,并讨论它们的差异、联系与开放性问题。我们将当前遗忘方法分为四类场景:中心化遗忘、分布式与不规则数据遗忘、遗忘验证以及遗忘中的隐私与安全问题。由于中心化遗忘是主要领域,我们分两部分介绍:首先将中心化遗忘分为精确遗忘与近似遗忘;其次,对这些方法的技术细节进行详细介绍。除中心化遗忘外,我们注意到一些关于分布式与不规则数据遗忘的研究,并介绍联邦遗忘与图遗忘作为两个代表性方向。在介绍遗忘方法之后,我们回顾了关于遗忘验证的研究。此外,我们考虑隐私与安全在机器遗忘中的关键性,并整理了最新相关文献。最后,我们讨论了不同遗忘场景面临的挑战,并指出了潜在的研究方向。