The rapid advancement of text-to-image Diffusion Models has led to their widespread public accessibility. However these models, trained on large internet datasets, can sometimes generate undesirable outputs. To mitigate this, approximate Machine Unlearning algorithms have been proposed to modify model weights to reduce the generation of specific types of images, characterized by samples from a ``forget distribution'', while preserving the model's ability to generate other images, characterized by samples from a ``retain distribution''. While these methods aim to minimize the influence of training data in the forget distribution without extensive additional computation, we point out that they can compromise the model's integrity by inadvertently affecting generation for images in the retain distribution. Recognizing the limitations of FID and CLIPScore in capturing these effects, we introduce a novel retention metric that directly assesses the perceptual difference between outputs generated by the original and the unlearned models. We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines. Given their straightforward implementation, these algorithms serve as valuable benchmarks for future advancements in approximate Machine Unlearning for Diffusion Models.
翻译:文本到图像扩散模型的快速发展使其获得了广泛的公共可及性。然而,这些基于大规模互联网数据集训练的模型有时会产生不良输出。为缓解此问题,人们提出了近似机器遗忘学习算法,通过修改模型权重来减少生成特定类型的图像(其特征由“遗忘分布”的样本定义),同时保留模型生成其他图像(其特征由“保留分布”的样本定义)的能力。尽管这些方法旨在以最小的额外计算量消除遗忘分布中训练数据的影响,但我们指出,它们可能损害模型的完整性,无意中影响对保留分布中图像的生成。认识到FID和CLIPScore在捕捉这些效应方面的局限性,我们引入了一种新颖的保留度量,直接评估原始模型与遗忘学习后模型生成输出之间的感知差异。随后,我们提出了遗忘学习算法,相较于现有基线方法,这些算法在保持模型完整性方面展现出更优的有效性。鉴于其简洁的实现方式,这些算法可作为扩散模型近似机器遗忘学习未来进展的重要基准。