Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks

The increased attention to regulating the outputs of deep generative models, driven by growing concerns about privacy and regulatory compliance, has highlighted the need for effective control over these models. This necessity arises from instances where generative models produce outputs containing undesirable, offensive, or potentially harmful content. To tackle this challenge, the concept of machine unlearning has emerged, aiming to forget specific learned information or to erase the influence of undesired data subsets from a trained model. The objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained GAN where the underlying training data set is inaccessible. Our approach is inspired by a crucial observation: the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features. However, such directions usually result in the degradation of the quality of generated samples. Our proposed method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples. This method unfolds in two stages: in the initial stage, we adapt the pre-trained GAN using negative samples provided by the user, while in the subsequent stage, we focus on unlearning the undesired feature. During the latter phase, we train the pre-trained GAN using positive samples, incorporating a repulsion regularizer. This regularizer encourages the model's parameters to be away from the parameters associated with the adapted model from the first stage while also maintaining the quality of generated samples. To the best of our knowledge, our approach stands as first method addressing unlearning in GANs. We validate the effectiveness of our method through comprehensive experiments.

翻译：随着对隐私和法规合规性日益增长的担忧，深度生成模型输出监管受到更多关注，这凸显了对这些模型进行有效控制的必要性。这一需求源于生成模型可能产生包含不良、冒犯性或潜在有害内容的输出。为应对这一挑战，机器学习遗忘概念应运而生，旨在遗忘特定学习信息或消除训练模型中不期望数据子集的影响。本研究的目标是阻止预训练GAN生成包含不期望特征的输出，且该GAN的基础训练数据集不可访问。我们的方法源于一个关键观察：GAN的参数空间存在有意义的语义方向，可用于抑制特定不期望特征。然而，此类方向通常会导致生成样本质量下降。我们提出的"先适应再遗忘"方法在遗忘此类不期望特征的同时，能够保持生成样本的质量。该方法分为两个阶段：初始阶段，我们使用用户提供的负样本对预训练GAN进行适配；随后阶段，我们专注于遗忘不期望特征。在后一阶段，我们使用正样本训练预训练GAN，并引入排斥正则化项。该正则化项鼓励模型参数远离第一阶段适配模型对应的参数，同时保持生成样本的质量。据我们所知，我们的方法是首个解决GAN中遗忘问题的方案。我们通过综合实验验证了该方法的有效性。