Diffusion models (DMs) have demonstrated exceptional performance in text-to-image (T2I) tasks, leading to their widespread use. With the introduction of classifier-free guidance (CFG), the quality of images generated by DMs is improved. However, DMs can generate more harmful images by maliciously guiding the image generation process through CFG. Some safe guidance methods aim to mitigate the risk of generating harmful images but often reduce the quality of clean image generation. To address this issue, we introduce the Harmful Guidance Redirector (HGR), which redirects harmful CFG direction while preserving clean CFG direction during image generation, transforming CFG into SafeCFG and achieving high safety and quality generation. We train HGR to redirect multiple harmful CFG directions simultaneously, demonstrating its ability to eliminate various harmful elements while preserving high-quality generation. Additionally, we find that HGR can detect image harmfulness, allowing for unsupervised fine-tuning of safe diffusion models without pre-defined clean or harmful labels. Experimental results show that by incorporating HGR, images generated by diffusion models achieve both high quality and strong safety, and safe DMs trained through unsupervised methods according to the harmfulness detected by HGR also exhibit good safety performance. The codes will be publicly available.
翻译:扩散模型(DMs)在文本到图像(T2I)任务中展现出卓越性能,因此得到广泛应用。随着无分类器引导(CFG)的引入,DMs生成的图像质量得到提升。然而,通过CFG恶意引导图像生成过程,DMs也可能生成更多有害图像。现有的一些安全引导方法旨在降低生成有害图像的风险,但往往会损害干净图像的生成质量。为解决此问题,我们提出了有害引导重定向器(HGR),它在图像生成过程中重定向有害的CFG方向,同时保留干净的CFG方向,从而将CFG转化为SafeCFG,实现高安全性与高质量生成。我们训练HGR以同时重定向多个有害CFG方向,证明其能够在保持高质量生成的同时消除多种有害内容。此外,我们发现HGR能够检测图像的有害性,从而无需预定义的干净或有害标签即可对安全扩散模型进行无监督微调。实验结果表明,通过引入HGR,扩散模型生成的图像兼具高质量与强安全性;同时,根据HGR检测到的有害性通过无监督方法训练的安全DMs也展现出良好的安全性能。相关代码将公开提供。