Erasing Concepts from Diffusion Models

Motivated by recent advancements in text-to-image diffusion, we study erasure of specific concepts from the model's weights. While Stable Diffusion has shown promise in producing explicit or realistic artwork, it has raised concerns regarding its potential for misuse. We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher. We benchmark our method against previous approaches that remove sexually explicit content and demonstrate its effectiveness, performing on par with Safe Latent Diffusion and censored training. To evaluate artistic style removal, we conduct experiments erasing five modern artists from the network and conduct a user study to assess the human perception of the removed styles. Unlike previous methods, our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time, so it cannot be circumvented even if a user has access to model weights. Our code, data, and results are available at https://erasing.baulab.info/

翻译：受近期文本到图像扩散技术进展的启发，我们研究从模型权重中擦除特定概念的方法。虽然稳定扩散（Stable Diffusion）在生成露骨或逼真艺术作品方面展现出潜力，但也引发了对其潜在滥用的担忧。我们提出一种微调方法，仅需给定风格名称，并利用负向引导作为教师信号，即可从预训练扩散模型中擦除视觉概念。我们将该方法与先前移除色情内容的方案进行基准测试，并证明其有效性，表现与安全潜在扩散（Safe Latent Diffusion）及审查训练相当。为评估艺术风格移除效果，我们开展实验从网络中擦除五位现代艺术家的风格，并通过用户研究评估人类对移除风格的感知。与先前方法不同，我们的方法可永久性地从扩散模型中移除概念，而非在推理阶段修改输出，因此即便用户有权访问模型权重也无法规避。我们的代码、数据及结果详见https://erasing.baulab.info/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/