DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges of this popular and fundamental AI technique. In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the properties of the trigger pattern in the existing diffusion backdoor attacks, discovering the important role of distribution discrepancy in Trojan detection. Based on this finding, we propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100\% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100\% detection pass rate with very high attack and benign performance for the backdoored diffusion models.

翻译：在激动人心的生成式AI时代，扩散模型已成为一种功能强大且广泛采用的内容生成与编辑工具，适用于多种数据模态，因此研究其潜在安全风险显得非常必要且关键。近期，一些开创性工作揭示了扩散模型易受后门攻击的脆弱性，这迫切需要对这一流行且基础的AI技术中的安全挑战进行深入分析和研究。本文首次系统性地探索了被后门攻击的扩散模型中恶意噪声输入的可检测性——这是现有研究中尚待深入探索的重要性能指标。我们从防御者的视角出发，首先分析了现有扩散后门攻击中触发器模式的性质，发现分布差异在木马检测中发挥关键作用。基于这一发现，我们提出了一种低成本的触发器检测机制，能够有效识别恶意输入噪声。进一步地，我们从攻击者的角度研究同一问题，提出了一种能学习不可察觉触发器以规避所提检测方案的后门攻击策略。跨多种扩散模型与数据集的实证评估表明，所提出的触发器检测与规避检测攻击策略均具有有效性。在触发器检测方面，我们的基于分布差异的方案能以100%的检测率识别现有工作中的木马触发器。在规避检测方面，我们提出的隐蔽触发器设计方法通过端到端学习使恶意噪声输入的分布逼近良性噪声，从而实现近乎100%的检测通过率，同时确保被后门攻击的扩散模型保持极高的攻击效能与良性性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日