Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples from different attacks, which possesses large coverage of the entire adversarial feature space. Then, we pioneer to exploit multi-source domain adaptation in adversarial example detection with PADs as source domains. Experiments demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
翻译:对抗样本检测在对抗防御领域具有重要意义,可便捷应用于多种场景。然而,现有检测方法存在泛化性能较差的缺陷,其训练过程通常依赖单一已知攻击生成的样本,且训练样本与未见过的测试对抗样本之间存在显著分布差异。为解决该问题,我们提出一种名为基于主对抗域自适应的对抗样本检测(AED-PADA)的新方法。具体而言,本方法通过识别主对抗域(PADs),即融合不同攻击下对抗样本特征的组合,使其覆盖整个对抗特征空间的大部分区域。在此基础上,我们率先将多源域自适应技术应用于以PADs为源域的对抗样本检测。实验表明,所提出的AED-PADA方法具有卓越的泛化能力,尤其在使用最小扰动幅度约束的挑战性场景中优势显著。