Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.
翻译:大规模文本到图像扩散模型能够生成具有强大组合能力的高保真图像。然而,这些模型通常基于海量互联网数据进行训练,其中往往包含受版权保护的素材、授权图像及个人照片。此外,研究发现这些模型会复制多位在世艺术家的风格,或记忆特定的训练样本。如何在不从头重新训练模型的情况下移除这些受版权保护的概念或图像?为实现这一目标,我们提出了一种高效方法,可在预训练模型中消融特定概念,即阻止目标概念的生成。该算法通过学习将我们意图消融的目标风格、实例或文本提示对应的图像分布,与锚定概念对应的分布进行匹配,从而阻止模型在其文本条件下生成目标概念。大量实验表明,我们的方法能在成功阻止消融概念生成的同时,保留模型中其他紧密相关的概念。