Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.
翻译:过去关于对抗脆弱性的研究主要集中在对输入所有维度进行扰动的情形。另一方面,近期一系列研究工作考虑了以下两种情况:(i) 对手仅能扰动有限数量的输入参数,或 (ii) 多模态问题中的某几个模态。在这两种情形下,对抗样本实际上被限制在环境输入空间 $\mathcal{X}$ 中的一个子空间 $V$ 内。受此启发,本文研究了对抗脆弱性如何依赖于 $\dim(V)$。具体来说,我们证明在 $\ell^p$ 范数约束下的标准 PGD 攻击的对抗成功率,表现为 $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ 的单调递增函数,其中 $\epsilon$ 为扰动预算,且 $\frac{1}{p} + \frac{1}{q} =1$,当 $p > 1$ 时成立($p=1$ 的情形存在额外复杂性,我们将对此进行详细分析)。这一函数形式可从一个简单的线性玩具模型中轻易导出,因此我们的结果进一步证实了以下观点:对抗样本是高维空间上局部线性模型固有的特征。