In this study, we introduce a novel, probabilistic viewpoint on adversarial examples, achieved through box-constrained Langevin Monte Carlo (LMC). Proceeding from this perspective, we develop an innovative approach for generating semantics-aware adversarial examples in a principled manner. This methodology transcends the restriction imposed by geometric distance, instead opting for semantic constraints. Our approach empowers individuals to incorporate their personal comprehension of semantics into the model. Through human evaluation, we validate that our semantics-aware adversarial examples maintain their inherent meaning. Experimental findings on the MNIST and SVHN datasets demonstrate that our semantics-aware adversarial examples can effectively circumvent robust adversarial training methods tailored for traditional adversarial attacks.
翻译:在本研究中,我们通过带约束的朗之万蒙特卡洛(LMC)方法引入了一种关于对抗样本的新颖概率视角。基于这一视角,我们开发了一种创新方法,以原则性方式生成语义感知的对抗样本。该方法超越了几何距离的限制,转而采用语义约束。我们的方法使个体能够将其对语义的个人理解融入模型中。通过人工评估,我们验证了语义感知对抗样本能够保持其固有含义。在MNIST和SVHN数据集上的实验结果表明,我们的语义感知对抗样本能够有效规避针对传统对抗攻击设计的鲁棒对抗训练方法。