Adversarial Examples are Misaligned in Diffusion Model Manifolds

In recent years, diffusion models (DMs) have drawn significant attention for their success in approximating data distributions, yielding state-of-the-art generative results. Nevertheless, the versatility of these models extends beyond their generative capabilities to encompass various vision applications, such as image inpainting, segmentation, adversarial robustness, among others. This study is dedicated to the investigation of adversarial attacks through the lens of diffusion models. However, our objective does not involve enhancing the adversarial robustness of image classifiers. Instead, our focus lies in utilizing the diffusion model to detect and analyze the anomalies introduced by these attacks on images. To that end, we systematically examine the alignment of the distributions of adversarial examples when subjected to the process of transformation using diffusion models. The efficacy of this approach is assessed across CIFAR-10 and ImageNet datasets, including varying image sizes in the latter. The results demonstrate a notable capacity to discriminate effectively between benign and attacked images, providing compelling evidence that adversarial instances do not align with the learned manifold of the DMs.

翻译：近年来，扩散模型因其在近似数据分布方面的成功而备受关注，取得了最先进的生成结果。然而，这些模型的通用性不仅限于其生成能力，还涵盖诸如图像修复、分割、对抗鲁棒性等多种视觉应用。本研究致力于从扩散模型的视角探究对抗攻击。但我们的目标并非提升图像分类器的对抗鲁棒性，而是聚焦于利用扩散模型检测与分析这些攻击在图像中引入的异常。为此，我们系统性地考察了对抗样本在经受扩散模型变换过程中的分布对齐情况。该方法在CIFAR-10和ImageNet数据集上进行了评估，后者包含不同图像尺寸。结果表明，该方法在有效区分良性图像与受攻击图像方面展现出显著能力，这为对抗实例与扩散模型学习到的流形不匹配提供了有力证据。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/