We propose a probabilistic perspective on adversarial examples. This perspective allows us to view geometric restrictions on adversarial examples as distributions, enabling a seamless shift towards data-driven, semantic constraints. Building on this foundation, we present a method for creating semantics-aware adversarial examples in a principle way. Leveraging the advanced generalization capabilities of contemporary probabilistic generative models, our method produces adversarial perturbations that maintain the original image's semantics. Moreover, it offers users the flexibility to inject their own understanding of semantics into the adversarial examples. Our empirical findings indicate that the proposed methods achieve enhanced transferability and higher success rates in circumventing adversarial defense mechanisms, while maintaining a low detection rate by human observers.
翻译:我们从概率视角提出对抗样本的一般性框架。该视角允许将对抗样本的几何约束视为概率分布,从而自然过渡到数据驱动的语义约束。基于这一基础,我们提出了一种以原理性方式生成语义感知对抗样本的方法。利用当代概率生成模型强大的泛化能力,该方法产生的对抗扰动能够保持原始图像的语义特征。此外,该方法赋予用户灵活地将自身对语义的理解注入对抗样本的能力。实验结果表明,所提方法在规避对抗防御机制时展现出更强的可迁移性和更高的成功率,同时保持较低的人类观测者检测率。