Diffusion models have been leveraged to perform adversarial purification and thus provide both empirical and certified robustness for a standard model. On the other hand, different robustly trained smoothed models have been studied to improve the certified robustness. Thus, it raises a natural question: Can diffusion model be used to achieve improved certified robustness on those robustly trained smoothed models? In this work, we first theoretically show that recovered instances by diffusion models are in the bounded neighborhood of the original instance with high probability; and the "one-shot" denoising diffusion probabilistic models (DDPM) can approximate the mean of the generated distribution of a continuous-time diffusion model, which approximates the original instance under mild conditions. Inspired by our analysis, we propose a certifiably robust pipeline DiffSmooth, which first performs adversarial purification via diffusion models and then maps the purified instances to a common region via a simple yet effective local smoothing strategy. We conduct extensive experiments on different datasets and show that DiffSmooth achieves SOTA-certified robustness compared with eight baselines. For instance, DiffSmooth improves the SOTA-certified accuracy from $36.0\%$ to $53.0\%$ under $\ell_2$ radius $1.5$ on ImageNet. The code is available at [https://github.com/javyduck/DiffSmooth].
翻译:扩散模型已被用于执行对抗净化,从而为标准模型同时提供经验鲁棒性和可证鲁棒性。另一方面,已有研究探讨了不同经过鲁棒训练的平滑模型,以提升可证鲁棒性。由此自然产生一个问题:能否利用扩散模型在已进行鲁棒训练的平滑模型上实现更优的可证鲁棒性?本文首先从理论上证明,扩散模型恢复的实例以高概率位于原始实例的有界邻域内;并且"单次"去噪扩散概率模型(DDPM)可近似连续时间扩散模型生成分布的均值,进而在温和条件下逼近原始实例。受理论分析启发,我们提出可证鲁棒管道DiffSmooth:该管道首先通过扩散模型进行对抗净化,再通过简单有效的局部平滑策略将净化后的实例映射到公共区域。我们在多个数据集上开展大量实验,结果表明,与八种基准方法相比,DiffSmooth达到了最先进的可证鲁棒性。例如,在ImageNet数据集上,当$\ell_2$半径为1.5时,DiffSmooth将最先进的可证准确率从$36.0\%$提升至$53.0\%$。相关代码开源于[https://github.com/javyduck/DiffSmooth]。