Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
翻译:对抗样本检测作为对抗防御领域的重要方法,可便捷地应用于多种场景。然而,现有检测方法普遍存在泛化性能不佳的问题,这主要源于其训练过程通常依赖于单一已知对抗攻击生成的样本,导致训练样本与未见测试对抗样本之间存在显著差异。为解决这一问题,我们提出了一种名为“基于主对抗域适应的对抗样本检测”(AED-PADA)的新方法。具体而言,我们的方法首先识别主对抗域——即由不同攻击生成的对抗样本特征组合,其覆盖了对抗特征空间的绝大部分。随后,我们首次将多源无监督域适应技术引入对抗样本检测领域,并以主对抗域作为源域。实验结果表明,我们提出的AED-PADA具有卓越的泛化能力。需要特别指出的是,这种优势在采用最小扰动幅度约束的挑战性场景中尤为显著。