Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models

Despite the advancements in Text-to-Image (T2I) generation models, their potential for misuse or even abuse raises serious safety concerns. Model developers have made tremendous efforts to introduce safety mechanisms that can address these concerns in T2I models. However, the existing safety mechanisms, whether external or internal, either remain susceptible to evasion under distribution shifts or require extensive model-specific adjustments. To address these limitations, we introduce Safe-Control, an innovative plug-and-play safety patch designed to mitigate unsafe content generation in T2I models. Using data-driven strategies and safety-aware conditions, Safe-Control injects safety control signals into the locked T2I model, acting as an update in a patch-like manner. Model developers can also construct various safety patches to meet the evolving safety requirements, which can be flexibly merged into a single, unified patch. Its plug-and-play design further ensures adaptability, making it compatible with other T2I models of similar denoising architecture. We conduct extensive evaluations on six diverse and public T2I models. Empirical results highlight that Safe-Control is effective in reducing unsafe content generation across six diverse T2I models with similar generative architectures, yet it successfully maintains the quality and text alignment of benign images. Compared to seven state-of-the-art safety mechanisms, including both external and internal defenses, Safe-Control significantly outperforms all baselines in reducing unsafe content generation. For example, it reduces the probability of unsafe content generation to 7%, compared to approximately 20% for most baseline methods, under both unsafe prompts and the latest adversarial attacks.

翻译：尽管文本到图像（T2I）生成模型取得了显著进展，但其潜在的误用甚至滥用引发了严重的安全担忧。模型开发者已付出巨大努力，在T2I模型中引入能够应对这些担忧的安全机制。然而，现有的安全机制，无论是外部还是内部机制，要么在分布偏移下仍易受规避，要么需要进行大量模型特定的调整。为克服这些局限，我们提出了Safe-Control，一种创新的即插即用安全补丁，旨在缓解T2I模型中的不安全内容生成。利用数据驱动策略和安全感知条件，Safe-Control将安全控制信号注入到锁定的T2I模型中，以类似补丁更新的方式发挥作用。模型开发者还可以构建多种安全补丁以满足不断变化的安全需求，这些补丁能够灵活地合并为单一的统一补丁。其即插即用设计进一步确保了适应性，使其能够兼容其他具有类似去噪架构的T2I模型。我们在六个多样化且公开的T2I模型上进行了广泛评估。实证结果表明，Safe-Control能有效减少具有相似生成架构的六个不同T2I模型中的不安全内容生成，同时成功保持了良性图像的质量和文本对齐性。与七种最先进的安全机制（包括外部和内部防御）相比，Safe-Control在减少不安全内容生成方面显著优于所有基线方法。例如，在不安全提示和最新对抗性攻击下，它将不安全内容生成概率降低至7%，而大多数基线方法约为20%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日