Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20% on CIFAR-10 under $\ell_\infty$ attack $(\epsilon=8/255)$, and over 10% on ImageNet under $\ell_\infty$ attack $(\epsilon=4/255)$. We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.
翻译:扩散净化防御利用扩散模型去除对抗样本中精心设计的扰动,从而获得最先进的鲁棒性。近期研究表明,即使先进的攻击也难以有效突破此类防御,因为净化过程会引入极深的计算图,导致梯度模糊、高内存开销和无限随机性等问题。本文提出统一框架DiffAttack,可对基于扩散的净化防御(包括DDPM和基于评分的方法)执行高效攻击。具体而言,我们在中间扩散步骤提出偏差重构损失,通过诱导不准确的密度梯度估计来解决梯度消失/爆炸问题。同时,我们设计分段前向-反向传播算法,实现内存高效的梯度反向传播。我们在CIFAR-10和ImageNet上验证了DiffAttack相比现有自适应攻击的有效性。实验表明,在$\ell_\infty$攻击($\epsilon=8/255$)下,DiffAttack使CIFAR-10模型的鲁棒准确率相比最先进(SOTA)攻击降低超过20%;在$\ell_\infty$攻击($\epsilon=4/255$)下,ImageNet模型降低超过10%。通过一系列消融研究,我们发现:1)在均匀采样时间步上添加偏差重构损失的DiffAttack比仅在初始/最终步添加更有效;2)具有适中扩散长度的扩散净化在DiffAttack下表现出更强鲁棒性。