In this work, we formulate a novel framework for adversarial robustness using the manifold hypothesis. This framework provides sufficient conditions for defending against adversarial examples. We develop an adversarial purification method with this framework. Our method combines manifold learning with variational inference to provide adversarial robustness without the need for expensive adversarial training. Experimentally, our approach can provide adversarial robustness even if attackers are aware of the existence of the defense. In addition, our method can also serve as a test-time defense mechanism for variational autoencoders.
翻译:本文利用流形假设构建了一个全新的对抗鲁棒性框架。该框架提供了防御对抗样本的充分条件,并据此开发了一种对抗净化方法。该方法将流形学习与变分推断相结合,无需昂贵的对抗训练即可实现对抗鲁棒性。实验表明,即使攻击者知晓防御机制的存在,我们的方法仍能提供对抗鲁棒性。此外,该方法还可作为变分自编码器的测试时防御机制。