Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robustness is still lacking, raising concerns about their vulnerability to stronger future attacks. In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. To achieve non-constant Lipschitzness, thereby obtaining much tighter certified robustness, we generalize diffusion classifiers to classify Gaussian-corrupted data. This involves deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. Experimental results show the superior certified robustness of these Noised Diffusion Classifiers (NDCs). Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under adversarial perturbations with \(\ell_2\) norms less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.
翻译:生成式学习以其对数据分布的有效建模而著称,在处理分布外样本方面具有内在优势,特别是在增强对抗攻击鲁棒性方面。其中,利用强大扩散模型的扩散分类器已展现出卓越的经验鲁棒性。然而,对其鲁棒性的系统理论理解仍然缺乏,这引发了对其抵御未来更强攻击脆弱性的担忧。在本研究中,我们证明了扩散分类器具有$O(1)$利普希茨连续性,并建立了其可认证鲁棒性,从而揭示了其内在的稳健特性。为获得非恒定利普希茨连续性以实现更严格的可认证鲁棒性,我们将扩散分类器推广至高斯噪声扰动数据的分类任务。这涉及推导此类分布的变分下界,利用变分下界近似似然函数,并通过贝叶斯定理计算分类概率。实验结果表明,这些加噪扩散分类器具有卓越的可认证鲁棒性。值得注意的是,在仅使用单一现成扩散模型且无需额外数据的条件下,我们在CIFAR-10数据集上实现了超过80%(\(\ell_2\)范数小于0.25)和70%(\(\ell_2\)范数小于0.5)的可认证鲁棒率。