Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. Unfortunately, estimating or comparing two data distributions is extremely difficult, especially in high-dimension spaces. Recently, the gradient of log probability density (a.k.a., score) w.r.t. the sample is used as an alternative statistic to compute. However, we find that the score is sensitive in identifying adversarial samples due to insufficient information with one sample only. In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. Specifically, to obtain adequate information regarding one sample, we perturb it by adding various noises to capture its multi-view observations. We theoretically prove that EPS is a proper statistic to compute the discrepancy between two samples under mild conditions. In practice, we can use a pre-trained diffusion model to estimate EPS for each sample. Last, we propose an EPS-based adversarial detection (EPS-AD) method, in which we develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples. We also prove that the EPS-based MMD between natural and adversarial samples is larger than that among natural samples. Extensive experiments show the superior adversarial detection performance of our EPS-AD.
翻译:对抗检测旨在根据自然分布与对抗分布之间的差异,判断给定样本是否为对抗样本。然而,估计或比较两个数据分布极其困难,尤其是在高维空间中。最近,样本的对数概率密度梯度(即评分)被用作一种可计算的替代统计量。但我们发现,由于仅依赖单个样本的信息不足,该评分在识别对抗样本时较为敏感。本文提出一种新统计量——期望扰动评分(EPS),其本质是样本经多种扰动后的期望评分。具体而言,为获取样本的充分信息,我们通过添加不同噪声对其进行扰动,以捕获其多视角观测。我们理论上证明,在温和条件下,EPS是衡量两个样本间差异的恰当统计量。实践中,可利用预训练扩散模型为每个样本估算EPS。最后,我们提出基于EPS的对抗检测方法(EPS-AD),其中构建了基于EPS的最大均值差异(MMD)作为度量,以衡量测试样本与自然样本间的差异。我们还证明,自然样本与对抗样本间基于EPS的MMD大于自然样本间的MMD。大量实验表明,我们的EPS-AD具有优越的对抗检测性能。