SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Recent advances in diffusion models have significantly enhanced their ability to generate high-quality images and videos, but they have also increased the risk of producing unsafe content. Existing unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges: (1) They cannot instantly remove harmful concepts without training. (2) Their safe generation capabilities depend on collected training data. (3) They alter model weights, risking degradation in quality for content unrelated to toxic concepts. To address these, we propose SAFREE, a novel, training-free approach for safe T2I and T2V, that does not alter the model's weights. Specifically, we detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace, thereby filtering out harmful content while preserving intended semantics. To balance the trade-off between filtering toxicity and preserving safe concepts, SAFREE incorporates a novel self-validating filtering mechanism that dynamically adjusts the denoising steps when applying the filtered embeddings. Additionally, we incorporate adaptive re-attention mechanisms within the diffusion latent space to selectively diminish the influence of features related to toxic concepts at the pixel level. In the end, SAFREE ensures coherent safety checking, preserving the fidelity, quality, and safety of the output. SAFREE achieves SOTA performance in suppressing unsafe content in T2I generation compared to training-free baselines and effectively filters targeted concepts while maintaining high-quality images. It also shows competitive results against training-based methods. We extend SAFREE to various T2I backbones and T2V tasks, showcasing its flexibility and generalization. SAFREE provides a robust and adaptable safeguard for ensuring safe visual generation.

翻译：近期扩散模型的进展显著提升了生成高质量图像与视频的能力，但同时也增加了产生不安全内容的风险。现有的基于遗忘/编辑的安全生成方法虽然能够从模型中移除有害概念，但仍面临以下挑战：(1) 无法在不进行训练的情况下即时移除有害概念；(2) 其安全生成能力依赖于收集的训练数据；(3) 这些方法会修改模型权重，可能导致与有害概念无关的内容生成质量下降。为解决这些问题，我们提出了SAFREE——一种新颖的、无需训练且不改变模型权重的安全文本到图像（T2I）与文本到视频（T2V）生成方法。具体而言，我们在文本嵌入空间中检测对应于一组有害概念的子空间，并将提示词嵌入引导远离该子空间，从而在保留预期语义的同时过滤有害内容。为平衡过滤有害内容与保留安全概念之间的权衡，SAFREE引入了一种新颖的自验证过滤机制，该机制在应用过滤后的嵌入时动态调整去噪步骤。此外，我们在扩散潜空间中引入了自适应重注意力机制，以在像素级别选择性地削弱与有害概念相关的特征影响。最终，SAFREE确保了连贯的安全检查，保持了输出结果的保真度、质量与安全性。与无需训练的基线方法相比，SAFREE在抑制T2I生成中的不安全内容方面达到了最先进的性能，并在保持高质量图像的同时有效过滤了目标概念。相较于基于训练的方法，SAFREE也展现出具有竞争力的结果。我们将SAFREE扩展至多种T2I骨干模型及T2V任务，展示了其灵活性与泛化能力。SAFREE为保障安全的视觉生成提供了一个鲁棒且自适应的防护机制。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日