We question the current evaluation practice on diffusion-based purification methods. Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time. The approach gains increasing attention as an alternative to adversarial training due to the disentangling between training and testing. Well-known white-box attacks are often employed to measure the robustness of the purification. However, it is unknown whether these attacks are the most effective for the diffusion-based purification since the attacks are often tailored for adversarial training. We analyze the current practices and provide a new guideline for measuring the robustness of purification methods against adversarial attacks. Based on our analysis, we further propose a new purification strategy improving robustness compared to the current diffusion-based purification methods.
翻译:我们质疑当前对基于扩散的净化方法的评估实践。基于扩散的净化方法旨在测试阶段从输入数据点中移除对抗性效应。由于训练和测试的解耦,该方法作为对抗性训练的替代方案受到了越来越多的关注。通常采用著名的白盒攻击来衡量净化的鲁棒性。然而,这些攻击是否对基于扩散的净化最为有效尚不可知,因为这些攻击往往是针对对抗性训练而设计的。我们分析了当前实践,并为衡量净化方法对对抗性攻击的鲁棒性提供了新的指南。基于我们的分析,我们进一步提出了一种新的净化策略,与当前基于扩散的净化方法相比,该策略提高了鲁棒性。