As machine learning continues to develop, and data misuse scandals become more prevalent, individuals are becoming increasingly concerned about their personal information and are advocating for the right to remove their data. Machine unlearning has emerged as a solution to erase training data from trained machine learning models. Despite its success in classifiers, research on Generative Adversarial Networks (GANs) is limited due to their unique architecture, including a generator and a discriminator. One challenge pertains to generator unlearning, as the process could potentially disrupt the continuity and completeness of the latent space. This disruption might consequently diminish the model's effectiveness after unlearning. Another challenge is how to define a criterion that the discriminator should perform for the unlearning images. In this paper, we introduce a substitution mechanism and define a fake label to effectively mitigate these challenges. Based on the substitution mechanism and fake label, we propose a cascaded unlearning approach for both item and class unlearning within GAN models, in which the unlearning and learning processes run in a cascaded manner. We conducted a comprehensive evaluation of the cascaded unlearning technique using the MNIST and CIFAR-10 datasets. Experimental results demonstrate that this approach achieves significantly improved item and class unlearning efficiency, reducing the required time by up to 185x and 284x for the MNIST and CIFAR-10 datasets, respectively, in comparison to retraining from scratch. Notably, although the model's performance experiences minor degradation after unlearning, this reduction is negligible when dealing with a minimal number of images (e.g., 64) and has no adverse effects on downstream tasks such as classification.
翻译:随着机器学习技术的不断发展,以及数据滥用丑闻的日益频发,个人对其信息隐私的关注度持续上升,并积极主张删除个人数据的权利。机器遗忘学习应运而生,旨在从已训练好的机器学习模型中擦除训练数据。尽管该方法在分类器领域取得了成功,但由于生成对抗网络(GAN)独特的架构(包含生成器和判别器),相关研究仍然有限。其一,生成器遗忘面临挑战:该过程可能破坏潜在空间的连续性与完整性,进而导致模型遗忘后性能下降。其二,如何定义判别器针对待遗忘图像的判定标准也成为难题。本文提出一种替代机制,并定义虚假标签以有效应对上述挑战。基于替代机制与虚假标签,我们提出了一种级联遗忘方法,用于GAN模型中的实例遗忘与类别遗忘,其中遗忘与学习过程以级联方式交替进行。我们利用MNIST和CIFAR-10数据集对级联遗忘技术进行了全面评估。实验结果表明,该方法在实例与类别遗忘效率上显著提升:与从头重新训练相比,在MNIST和CIFAR-10数据集上所需时间分别减少了高达185倍和284倍。值得注意的是,虽然模型在遗忘后性能略有下降,但在处理少量图像(如64张)时该降幅可忽略不计,且对分类等下游任务无负面影响。