The strategy of combining diffusion-based generative models with classifiers continues to demonstrate state-of-the-art performance on adversarial robustness benchmarks. Known as adversarial purification, this exploits a diffusion model's capability of identifying high density regions in data distributions to purify adversarial perturbations from inputs. However, existing diffusion-based purification defenses are impractically slow and limited in robustness due to the low levels of noise used in the diffusion process. This low noise design aims to preserve the semantic features of the original input, thereby minimizing utility loss for benign inputs. Our findings indicate that systematic amplification of noise throughout the diffusion process improves the robustness of adversarial purification. However, this approach presents a key challenge, as noise levels cannot be arbitrarily increased without risking distortion of the input. To address this key problem, we introduce high levels of noise during the forward process and propose the ring proximity correction to gradually eliminate adversarial perturbations whilst closely preserving the original data sample. As a second contribution, we propose a new stochastic sampling method which introduces additional noise during the reverse diffusion process to dilute adversarial perturbations. Without relying on gradient obfuscation, these contributions result in a new robustness accuracy record of 44.23% on ImageNet using AutoAttack ($\ell_{\infty}=4/255$), an improvement of +2.07% over the previous best work. Furthermore, our method reduces inference time to 1.08 seconds per sample on ImageNet, a $47\times$ improvement over the existing state-of-the-art approach, making it far more practical for real-world defensive scenarios.
翻译:将基于扩散的生成模型与分类器相结合的策略,在对抗鲁棒性基准测试中持续展现出最先进的性能。这种方法被称为对抗净化,它利用扩散模型识别数据分布中高密度区域的能力,以清除输入中的对抗性扰动。然而,现有的基于扩散的净化防御方法由于在扩散过程中使用的噪声水平较低,存在速度过慢且鲁棒性有限的缺陷。这种低噪声设计旨在保留原始输入的语义特征,从而最大限度地减少对良性输入的效用损失。我们的研究发现,在扩散过程中系统地放大噪声可以提高对抗净化的鲁棒性。然而,这种方法面临一个关键挑战,即噪声水平不能任意增加,否则可能导致输入失真。为解决这一核心问题,我们在前向过程中引入高水平噪声,并提出环形邻近校正方法,以在紧密保留原始数据样本的同时逐步消除对抗性扰动。作为第二项贡献,我们提出了一种新的随机采样方法,该方法在反向扩散过程中引入额外噪声以稀释对抗性扰动。在不依赖梯度混淆的情况下,这些贡献使得在ImageNet数据集上使用AutoAttack($\ell_{\infty}=4/255$)实现了44.23%的鲁棒性准确率新纪录,较先前最佳工作提升了+2.07%。此外,我们的方法将ImageNet上每个样本的推理时间缩短至1.08秒,相比现有最先进方法提升了$47\times$,使其在实际防御场景中更具实用性。