Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-based methods. Simply reducing the network size and timesteps in DMs could significantly harm the image generation quality, which invalidates previous frameworks. To alleviate this issue, we redesign the diffusion framework from generating high-quality images to predicting distinguishable image labels. Specifically, we employ an image translation framework to learn many-to-one mapping from input samples to designed orthogonal image labels. Based on this framework, we introduce an efficient Image-to-Image diffusion classifier with a pruned U-Net structure and reduced diffusion timesteps. Besides the framework, we redesign the optimization objective of DMs to fit the target of image classification, where a new classification loss is incorporated in the DM-based image translation framework to distinguish the generated label from those of other classes. We conduct sufficient evaluations of the proposed classifier under various attacks on popular benchmarks. Extensive experiments show that our method achieves better adversarial robustness with fewer computational costs than DM-based and CNN-based methods. The code is available at https://github.com/hfmei/IDC.

翻译：扩散模型在对抗鲁棒性领域展现出巨大潜力，其中基于扩散模型的防御方法无需对抗训练即可实现卓越的防御能力。然而，由于使用大规模预训练扩散模型，这些方法均需高昂计算成本，难以在强攻击下进行全面评估，也难以与基于CNN的传统方法进行公平比较。简单地减少扩散模型的网络规模和采样步数会显著损害图像生成质量，这将使既有框架失效。为缓解此问题，我们重新设计了扩散框架，将其目标从生成高质量图像转变为预测可区分的图像标签。具体而言，我们采用图像翻译框架来学习从输入样本到设计好的正交图像标签的多对一映射。基于此框架，我们提出一种高效的图像到图像扩散分类器，采用剪枝后的U-Net结构并减少扩散步数。除框架设计外，我们重构了扩散模型的优化目标以适应图像分类任务，在基于扩散模型的图像翻译框架中引入新的分类损失函数，以区分生成标签与其他类别的标签。我们在主流基准数据集上对提出的分类器进行了多种攻击下的充分评估。大量实验表明，相较于基于扩散模型和基于CNN的方法，我们的方法能以更少的计算成本实现更优的对抗鲁棒性。代码发布于https://github.com/hfmei/IDC。