Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purification inevitably causes data shift and is deemed susceptible to stronger adaptive attacks. To tackle these issues, we propose the Truth Maximization Diffusion Classifier (TMDC), a generative Bayesian classifier that builds upon pre-trained diffusion models and the Bayesian theorem. Unlike data-driven classifiers, TMDC, guided by Bayesian principles, utilizes the conditional likelihood from diffusion models to determine the class probabilities of input images, thereby insulating against the influences of data shift and the limitations of adversarial training. Moreover, to enhance TMDC's resilience against more potent adversarial attacks, we propose an optimization strategy for diffusion classifiers. This strategy involves post-training the diffusion model on perturbed datasets with ground-truth labels as conditions, guiding the diffusion model to learn the data distribution and maximizing the likelihood under the ground-truth labels. The proposed method achieves state-of-the-art performance on the CIFAR10 dataset against heavy white-box attacks and strong adaptive attacks. Specifically, TMDC achieves robust accuracies of 82.81% against $l_{\infty}$ norm-bounded perturbations and 86.05% against $l_{2}$ norm-bounded perturbations, respectively, with $\epsilon=0.05$.
翻译:对抗攻击通过引入微小扰动导致分类错误。近年来,扩散模型被应用于图像分类器,通过对抗训练或净化对抗噪声来提升鲁棒性。然而,基于扩散的对抗训练常面临收敛困难和高计算成本的问题。此外,基于扩散的净化方法不可避免地导致数据偏移,且被认为易受更强的自适应攻击影响。为解决这些问题,我们提出真相最大化扩散分类器(TMDC),这是一种基于预训练扩散模型和贝叶斯定理的生成式贝叶斯分类器。与数据驱动型分类器不同,TMDC以贝叶斯原理为指导,利用扩散模型的条件似然确定输入图像的类别概率,从而隔绝数据偏移的影响和对抗训练的限制。此外,为增强TMDC对更强对抗攻击的抵抗能力,我们提出一种扩散分类器优化策略。该策略在带有真实标签条件的扰动数据集上对扩散模型进行后训练,引导扩散模型学习数据分布并最大化真实标签下的似然。该方法在CIFAR10数据集上对强白盒攻击和强自适应攻击取得了最先进性能。具体而言,在$\epsilon=0.05$条件下,TMDC对$l_{\infty}$范数有界扰动达到82.81%的鲁棒准确率,对$l_{2}$范数有界扰动达到86.05%的鲁棒准确率。