We question the current evaluation practice on diffusion-based purification methods. Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time. The approach gains increasing attention as an alternative to adversarial training due to the disentangling between training and testing. Well-known white-box attacks are often employed to measure the robustness of the purification. However, it is unknown whether these attacks are the most effective for the diffusion-based purification since the attacks are often tailored for adversarial training. We analyze the current practices and provide a new guideline for measuring the robustness of purification methods against adversarial attacks. Based on our analysis, we further propose a new purification strategy showing competitive results against the state-of-the-art adversarial training approaches.
翻译:我们对当前基于扩散的净化方法的评估实践提出质疑。基于扩散的净化方法旨在在测试阶段移除输入数据点中的对抗性影响。由于该方法将训练与测试阶段解耦,作为对抗训练的一种替代方案,它正获得越来越多的关注。通常采用已知的白盒攻击来评估净化的鲁棒性。然而,由于这些攻击通常是为对抗训练量身定制的,尚不清楚这些攻击是否对基于扩散的净化最为有效。我们分析了当前实践,并提出了一套新的指南,用以衡量净化方法对抗攻击的鲁棒性。基于我们的分析,我们进一步提出了一种新的净化策略,该方法在对抗最先进的对抗训练方法时展现出具有竞争力的结果。