Diffusion models are recently employed as generative classifiers for robust classification. However, a comprehensive theoretical understanding of the robustness of diffusion classifiers is still lacking, leading us to question whether they will be vulnerable to future stronger attacks. In this study, we propose a new family of diffusion classifiers, named Noised Diffusion Classifiers~(NDCs), that possess state-of-the-art certified robustness. Specifically, we generalize the diffusion classifiers to classify Gaussian-corrupted data by deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. We integrate these generalized diffusion classifiers with randomized smoothing to construct smoothed classifiers possessing non-constant Lipschitzness. Experimental results demonstrate the superior certified robustness of our proposed NDCs. Notably, we are the first to achieve 80\%+ and 70\%+ certified robustness on CIFAR-10 under adversarial perturbations with $\ell_2$ norm less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.
翻译:扩散模型近年来被用作生成式分类器以实现鲁棒分类。然而,对于扩散分类器鲁棒性的系统性理论理解仍然缺失,这使我们质疑其是否会在未来更强大的攻击下变得脆弱。本研究提出一类新型扩散分类器,名为噪声扩散分类器(Noised Diffusion Classifiers, NDCs),其具备最先进的可认证鲁棒性。具体而言,我们通过推导高斯扰动数据的证据下界(ELBO),利用ELBO近似似然函数,并借助贝叶斯定理计算分类概率,将扩散分类器推广至对高斯扰动数据的分类。我们将这些推广后的扩散分类器与随机平滑相结合,构建了具有非恒定Lipschitz常数的平滑分类器。实验结果表明,我们提出的NDCs具有卓越的可认证鲁棒性。值得注意的是,我们首次在CIFAR-10数据集上,使用单个现成扩散模型且无需额外数据,即在$\ell_2$范数小于0.25和0.5的对抗扰动下分别实现了80%+和70%+的可认证鲁棒性。