We propose a probabilistic perspective on adversarial examples, allowing us to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner. Despite significant pixel-level modifications compared to traditional adversarial attacks, our method preserves the overall semantics of the image, making the changes difficult for humans to detect. This extensive pixel-level modification enhances our method's ability to deceive classifiers designed to defend against adversarial attacks. Our empirical findings indicate that the proposed methods achieve higher success rates in circumventing adversarial defense mechanisms, while remaining difficult for human observers to detect.
翻译:我们提出了一种对抗样本的概率视角,使我们能够以原则性的方式将语义的主观理解作为分布嵌入到对抗样本的生成过程中。尽管与传统对抗攻击相比,我们的方法进行了显著的像素级修改,但它保留了图像的整体语义,使得这些修改难以被人类察觉。这种广泛的像素级修改增强了我们方法欺骗那些设计用于防御对抗攻击的分类器的能力。我们的实证结果表明,所提出的方法在规避对抗防御机制方面取得了更高的成功率,同时仍然难以被人类观察者察觉。