Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.

翻译：去噪概率扩散模型在生成比GAN等先前模型更具照片真实感或达到人类水平插画的图像方面，展现出突破性性能。这种高图像生成能力催生了各领域众多下游应用的创新。然而，我们发现该技术实则是一把双刃剑：基于最先进的深度神经网络模型即使通过文本提示有意移除对人眼视觉系统至关重要的鲁棒特征时仍能保持其预测能力这一发现，我们识别出一种新型攻击——自然去噪扩散攻击。NDD攻击通过利用扩散模型的内在自然攻击能力，展现出生成低成本、模型无关且可迁移对抗样本的极高潜力。为系统评估NDD攻击风险，我们基于新构建的自然去噪扩散攻击数据集开展了大规模实证研究，通过回答6个研究问题来评估自然攻击能力。用户研究表明，该攻击在对93%的受试者保持隐蔽性的同时，能达到88%的检测率；我们还发现扩散模型嵌入的非鲁棒特征是自然攻击能力的主要成因。为验证模型无关与可迁移攻击能力，我们对特斯拉Model 3实施NDD攻击，结果表明73%的物理打印攻击可被误检为停车标志。我们期望本项研究与数据集能警示学界关注扩散模型风险，并推动面向鲁棒深度神经网络模型的进一步研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日