In recent years, diffusion models (DMs) have drawn significant attention for their success in approximating data distributions, yielding state-of-the-art generative results. Nevertheless, the versatility of these models extends beyond their generative capabilities to encompass various vision applications, such as image inpainting, segmentation, adversarial robustness, among others. This study is dedicated to the investigation of adversarial attacks through the lens of diffusion models. However, our objective does not involve enhancing the adversarial robustness of image classifiers. Instead, our focus lies in utilizing the diffusion model to detect and analyze the anomalies introduced by these attacks on images. To that end, we systematically examine the alignment of the distributions of adversarial examples when subjected to the process of transformation using diffusion models. The efficacy of this approach is assessed across CIFAR-10 and ImageNet datasets, including varying image sizes in the latter. The results demonstrate a notable capacity to discriminate effectively between benign and attacked images, providing compelling evidence that adversarial instances do not align with the learned manifold of the DMs.
翻译:近年来,扩散模型因其在近似数据分布方面的成功而备受关注,取得了最先进的生成结果。然而,这些模型的通用性不仅限于其生成能力,还涵盖诸如图像修复、分割、对抗鲁棒性等多种视觉应用。本研究致力于从扩散模型的视角探究对抗攻击。但我们的目标并非提升图像分类器的对抗鲁棒性,而是聚焦于利用扩散模型检测与分析这些攻击在图像中引入的异常。为此,我们系统性地考察了对抗样本在经受扩散模型变换过程中的分布对齐情况。该方法在CIFAR-10和ImageNet数据集上进行了评估,后者包含不同图像尺寸。结果表明,该方法在有效区分良性图像与受攻击图像方面展现出显著能力,这为对抗实例与扩散模型学习到的流形不匹配提供了有力证据。