Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purification inevitably causes data shift and is deemed susceptible to stronger adaptive attacks. To tackle these issues, we propose the Truth Maximization Diffusion Classifier (TMDC), a generative Bayesian classifier that builds upon pre-trained diffusion models and the Bayesian theorem. Unlike data-driven classifiers, TMDC, guided by Bayesian principles, utilizes the conditional likelihood from diffusion models to determine the class probabilities of input images, thereby insulating against the influences of data shift and the limitations of adversarial training. Moreover, to enhance TMDC's resilience against more potent adversarial attacks, we propose an optimization strategy for diffusion classifiers. This strategy involves post-training the diffusion model on perturbed datasets with ground-truth labels as conditions, guiding the diffusion model to learn the data distribution and maximizing the likelihood under the ground-truth labels. The proposed method achieves state-of-the-art performance on the CIFAR10 dataset against heavy white-box attacks and strong adaptive attacks. Specifically, TMDC achieves robust accuracies of 82.81% against $l_{\infty}$ norm-bounded perturbations and 86.05% against $l_{2}$ norm-bounded perturbations, respectively, with $\epsilon=0.05$.
翻译:对抗性攻击通过引入细微扰动导致分类错误。近年来,扩散模型被应用于图像分类器,通过对抗性训练或净化对抗性噪声提升鲁棒性。然而,基于扩散的对抗性训练常面临收敛困难和高计算成本的问题。此外,基于扩散的净化方法不可避免地引发数据偏移,且被认为易受更强自适应攻击的影响。为解决这些问题,我们提出真理最大化扩散分类器(TMDC),这是一种基于预训练扩散模型和贝叶斯定理的生成式贝叶斯分类器。与数据驱动分类器不同,TMDC在贝叶斯原理指导下,利用扩散模型的条件似然确定输入图像的类别概率,从而隔离数据偏移和对抗性训练局限性的影响。此外,为增强TMDC对更强对抗性攻击的抵抗力,我们提出一种扩散分类器优化策略。该策略在带有真实标签条件的扰动数据集上对扩散模型进行后训练,引导扩散模型学习数据分布,并最大化真实标签下的似然。所提方法在CIFAR10数据集上针对强白盒攻击和强自适应攻击实现了最优性能。具体而言,TMDC在$\epsilon=0.05$条件下,对$l_{\infty}$范数有界扰动和$l_{2}$范数有界扰动分别达到82.81%和86.05%的鲁棒准确率。