Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.
翻译:大规模文本到图像扩散模型能够生成具有强大组合能力的高保真图像。然而,这些模型通常在海量的互联网数据上进行训练,这些数据往往包含受版权保护的材料、授权图像以及个人照片。此外,它们已被发现会复制多位在世艺术家的风格,或记忆特定的训练样本。如何在无需从头重新训练模型的情况下移除这些受版权保护的概念或图像?为实现这一目标,我们提出一种高效方法,在预训练模型中消融概念,即阻止目标概念的生成。该算法学习将我们想要消融的目标风格、实例或文本提示所对应的图像分布,与锚定概念的分布相匹配。这阻止了模型在给定文本条件时生成目标概念。大量实验表明,我们的方法能在保留模型中紧密相关概念的同时,成功阻止消融概念的生成。