Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning algorithms have been developed to tackle concept erasing in diffusion models. While these algorithms yield good concept erasure, they all present one of the following issues: 1) the corrupted feature space yields synthesis of disintegrated objects, 2) the initially synthesized content undergoes a divergence in both spatial structure and semantics in the generated images, and 3) sub-optimal training updates heighten the model's susceptibility to utility harm. These issues severely degrade the original utility of generative models. In this work, we present a new approach that solves all of these challenges. We take inspiration from the concept of classifier guidance and propose a surgical update on the classifier guidance term while constraining the drift of the unconditional score term. Furthermore, our algorithm empowers the user to select an alternative to the erasing concept, allowing for more controllability. Our experimental results show that our algorithm not only erases the target concept effectively but also preserves the model's generation capability.
翻译:文本到图像模型(如Stable Diffusion)凭借大规模数据集的利用展现出令人瞩目的图像生成能力。然而,这些数据集可能包含露骨色情、受版权保护或不受欢迎的内容,导致模型能够直接生成此类图像。鉴于针对单个概念删除需求重新训练这些大型模型不可行,研究者开发了微调算法以应对扩散模型中的概念擦除问题。尽管现有算法能实现有效的概念擦除,但均存在以下问题之一:1)特征空间受损导致生成破碎物体;2)初始合成内容在生成图像的空间结构与语义上出现偏差;3)次优的训练更新加剧模型对效用损害敏感性。这些问题严重削弱了生成模型的原始效用。本研究提出了一种解决所有前述挑战的新方法。我们从分类器引导的概念中汲取灵感,提出在约束无条件得分项漂移的同时,对分类器引导项进行手术式更新。此外,我们的算法赋予用户选择替代擦除概念的能力,从而增强可控性。实验结果表明,该算法不仅能有效擦除目标概念,还能保持模型的生成能力。