Intriguing Properties of Diffusion Models: A Large-Scale Dataset for Evaluating Natural Attack Capability in Text-to-Image Generative Models

Denoising probabilistic diffusion models have shown breakthrough performance that can generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is indeed a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), by text prompts. The NDD attack can generate low-cost, model-agnostic, and transferrable adversarial attacks by exploiting the natural attack capability in diffusion models. Motivated by the finding, we construct a large-scale dataset, Natural Denoising Diffusion Attack (NDDA) dataset, to systematically evaluate the risk of the natural attack capability of diffusion models with state-of-the-art text-to-image diffusion models. We evaluate the natural attack capability by answering 6 research questions. Through a user study to confirm the validity of the NDD attack, we find that the NDD attack can achieve an 88% detection rate while being stealthy to 93% of human subjects. We also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferrable attack capability, we perform the NDD attack against an AD vehicle and find that 73% of the physically printed attacks can be detected as a stop sign. We hope that our study and dataset can help our community to be aware of the risk of diffusion models and facilitate further research toward robust DNN models.

翻译：去噪概率扩散模型展现了突破性的性能，能够生成比GAN等先前模型更逼真的图像或达到人类水平的插图。这种强大的图像生成能力催生了众多领域的下游应用。然而，我们发现这项技术实为双刃剑：基于当前最先进的深度神经网络模型即便通过文本提示故意移除人类视觉系统所必需的鲁棒特征后仍能保持其预测这一发现，我们识别出一种新型攻击——自然去噪扩散攻击。该攻击通过利用扩散模型中的自然攻击能力，能够生成低成本、模型无关且可迁移的对抗性攻击。受此发现启发，我们构建了大规模数据集——自然去噪扩散攻击数据集，以系统评估基于最先进文本到图像扩散模型的自然攻击能力风险。通过回答6个研究问题评估自然攻击能力，我们开展用户研究确认NDD攻击的有效性，发现该攻击在93%的人类受试者中保持隐蔽性的同时，能达到88%的检测率。此外，我们发现扩散模型嵌入的非鲁棒特征促成了自然攻击能力。为验证模型无关与可迁移攻击能力，我们对自动驾驶车辆实施NDD攻击，发现73%的物理打印攻击会被识别为停止标志。我们希望本研究及数据集能帮助学界警惕扩散模型的风险，并推动鲁棒DNN模型的进一步研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日