Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.
翻译:大规模文本到图像扩散模型能够生成高保真图像并具备强大的组合能力。然而,这类模型通常基于海量互联网数据进行训练,其中常包含受版权保护的内容、授权图像及个人照片。此外,研究发现这些模型会模仿在世艺术家的风格,或记忆训练数据中的精确样本。如何在不从头重新训练模型的前提下移除这类受版权保护的概念或图像?为实现这一目标,我们提出了一种高效方法,用于在预训练模型中消融特定概念,即阻止目标概念的生成。该算法学习将需要消融的目标风格、实例或文本提示对应的图像分布,与锚点概念的分布进行匹配,从而避免模型在给定文本条件时生成目标概念。大量实验表明,本方法在成功阻止消融概念生成的同时,能够保留模型中密切相关的其他概念。