Machine unlearning has emerged as a new paradigm to deliberately forget data samples from a given model in order to adhere to stringent regulations. However, existing machine unlearning methods have been primarily focused on classification models, leaving the landscape of unlearning for generative models relatively unexplored. This paper serves as a bridge, addressing the gap by providing a unifying framework of machine unlearning for image-to-image generative models. Within this framework, we propose a computationally-efficient algorithm, underpinned by rigorous theoretical analysis, that demonstrates negligible performance degradation on the retain samples, while effectively removing the information from the forget samples. Empirical studies on two large-scale datasets, ImageNet-1K and Places-365, further show that our algorithm does not rely on the availability of the retain samples, which further complies with data retention policy. To our best knowledge, this work is the first that represents systemic, theoretical, empirical explorations of machine unlearning specifically tailored for image-to-image generative models. Our code is available at https://github.com/jpmorganchase/l2l-generator-unlearning.
翻译:机器遗忘作为一种新兴范式,旨在从给定模型中刻意遗忘数据样本以满足严格法规要求。然而,现有机器遗忘方法主要聚焦于分类模型,针对生成模型的遗忘研究仍相对匮乏。本文作为桥梁,通过为图像到图像生成模型提供统一的机器遗忘框架来弥补这一空白。在该框架下,我们提出了一种计算高效的算法,该算法基于严格的理论分析,在有效移除遗忘样本信息的同时,对保留样本的性能退化可忽略不计。在ImageNet-1K和Places-365两个大规模数据集上的实证研究进一步表明,我们的算法不依赖保留样本的可用性,这更符合数据保留政策。据我们所知,这是首次系统性地从理论、实证层面探索专用于图像到图像生成模型的机器遗忘工作。我们的代码已开源在https://github.com/jpmorganchase/l2l-generator-unlearning。