Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.
翻译:过往关于对抗脆弱性的研究主要关注攻击者能够扰动模型输入所有维度的情景。另一方面,近期一系列工作考虑以下两种情况:(i) 攻击者只能扰动有限数量的输入参数,或 (ii) 多模态问题中的部分模态。在这两种情况下,对抗样本实际上被约束在环境输入空间 $\mathcal{X}$ 的子空间 $V$ 中。受此启发,本文研究对抗脆弱性如何依赖于 $\dim(V)$。具体而言,我们表明,在 $\ell^p$ 范数约束下,标准 PGD 攻击的对抗成功率表现为 $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ 的单调递增函数,其中 $\epsilon$ 为扰动预算,且 $\frac{1}{p} + \frac{1}{q} =1$,当 $p > 1$ 时成立($p=1$ 的情况涉及额外复杂性,我们将详细分析)。该函数形式可轻松从简单线性玩具模型推导得出,因此我们的结果为"对抗样本是高维空间局部线性模型固有特性"这一论点提供了进一步佐证。