Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.
翻译:近年来,基于扩散的净化方法(DiffPure)已被公认为对抗对抗样本的有效防御手段。然而,我们发现直接采用原始预训练扩散模型进行对抗净化的DiffPure方法并非最优。这是由于噪声净化性能与数据恢复质量之间存在固有的权衡。此外,现有对DiffPure评估的可靠性值得商榷,因为它们依赖于较弱的自适应攻击。在本工作中,我们提出了一种新颖的对抗扩散桥模型,命名为ADBM。ADBM直接从扩散后的对抗数据反向构建一个桥接回到其原始干净样本,从而增强了原始扩散模型的净化能力。通过理论分析和多种场景下的实验验证,ADBM被证明是一种优越且鲁棒的防御机制,为实际应用提供了重要前景。