Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD

翻译：神经网络已知对对抗样本敏感：这些样本是自然例子的微小变异，旨在故意误导模型。尽管在数字和物理场景中可以通过基于梯度的技术轻松生成，但它们往往与自然图像的实际数据分布存在显著差异，导致强度与隐蔽性之间存在权衡。本文提出了一种名为基于扩散的投影梯度下降（Diff-PGD）的新框架，用于生成逼真的对抗样本。通过利用扩散模型引导的梯度，Diff-PGD确保对抗样本既保持与原始数据分布的接近，又维持其有效性。此外，我们的框架可轻松针对特定任务进行定制，如数字攻击、物理世界攻击和基于风格的攻击。与现有的自然风格对抗样本生成方法相比，本框架实现了对抗损失与其他替代损失（如内容/平滑度/风格损失）的分离，从而更加稳定和可控。最后，我们证明通过Diff-PGD生成的样本比传统基于梯度的方法具有更好的可迁移性和抗净化能力。代码将发布在https://github.com/xavihart/Diff-PGD

相关内容

对抗样本

关注 13

对抗样本由Christian Szegedy等人提出，是指在数据集中通过故意添加细微的干扰所形成的输入样本，导致模型以高置信度给出一个错误的输出。在正则化背景下，通过对抗训练减少原有独立同分布的测试集的错误率——在对抗扰动的训练集样本上训练网络。对抗样本是指通过在数据中故意添加细微的扰动生成的一种输入样本，能够导致神经网络模型给出一个错误的预测结果。实质：对抗样本是通过向输入中加入人类难以察觉的扰动生成，能够改变人工智能模型的行为。其基本目标有两个，一是改变模型的预测结果；二是加入到输入中的扰动在人类看起来不足以引起模型预测结果的改变，具有表面上的无害性。对抗样本的相关研究对自动驾驶、智能家居等应用场景具有非常重要的意义。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日