Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks. The efficacy of DBP has been attributed to the forward diffusion process, which narrows the distribution gap between clean and adversarial images through the addition of Gaussian noise. Although this explanation has some theoretical support, the significance of its contribution to robustness remains unclear. In this paper, we argue that the inherent stochasticity in the DBP process is the primary driver of its robustness. To explore this, we introduce a novel Deterministic White-Box (DW-box) evaluation protocol to assess robustness in the absence of stochasticity and to analyze the attack trajectories and loss landscapes. Our findings suggest that DBP models primarily leverage stochasticity to evade effective attack directions, and their ability to purify adversarial perturbations can be weak. To further enhance the robustness of DBP models, we introduce Adversarial Denoising Diffusion Training (ADDT), which incorporates classifier-guided adversarial perturbations into diffusion training, thereby strengthening the DBP models' ability to purify adversarial perturbations. Additionally, we propose Rank-Based Gaussian Mapping (RBGM) to make perturbations more compatible with diffusion models. Experimental results validate the effectiveness of ADDT. In conclusion, our study suggests that future research on DBP can benefit from the perspective of decoupling the stochasticity-based and purification-based robustness.
翻译:基于扩散的净化(DBP)已成为一种有效的对抗攻击防御机制。DBP的有效性通常归因于前向扩散过程,该过程通过添加高斯噪声来缩小干净图像与对抗图像之间的分布差异。尽管这一解释得到了一定的理论支持,但其对鲁棒性的实际贡献程度仍不明确。本文认为,DBP过程中固有的随机性是其鲁棒性的主要驱动因素。为探究此问题,我们提出了一种新颖的确定性白盒(DW-box)评估协议,用于评估无随机性条件下的鲁棒性,并分析攻击轨迹与损失景观。研究发现表明,DBP模型主要利用随机性来规避有效的攻击方向,而其净化对抗扰动的能力可能较弱。为进一步增强DBP模型的鲁棒性,我们提出了对抗性去噪扩散训练(ADDT),该方法将分类器引导的对抗扰动融入扩散训练中,从而强化DBP模型净化对抗扰动的能力。此外,我们提出了基于排序的高斯映射(RBGM)以使扰动更兼容扩散模型。实验结果验证了ADDT的有效性。总之,本研究指出未来关于DBP的研究可从解耦基于随机性和基于净化的鲁棒性这一视角中获益。