Adversarial examples have shown a powerful ability to make a well-trained model misclassified. Current mainstream adversarial attack methods only consider one of the distortions among $L_0$-norm, $L_2$-norm, and $L_\infty$-norm. $L_0$-norm based methods cause large modification on a single pixel, resulting in naked-eye visible detection, while $L_2$-norm and $L_\infty$-norm based methods suffer from weak robustness against adversarial defense since they always diffuse tiny perturbations to all pixels. A more realistic adversarial perturbation should be sparse and imperceptible. In this paper, we propose a novel $L_p$-norm distortion-efficient adversarial attack, which not only owns the least $L_2$-norm loss but also significantly reduces the $L_0$-norm distortion. To this aim, we design a new optimization scheme, which first optimizes an initial adversarial perturbation under $L_2$-norm constraint, and then constructs a dimension unimportance matrix for the initial perturbation. Such a dimension unimportance matrix can indicate the adversarial unimportance of each dimension of the initial perturbation. Furthermore, we introduce a new concept of adversarial threshold for the dimension unimportance matrix. The dimensions of the initial perturbation whose unimportance is higher than the threshold will be all set to zero, greatly decreasing the $L_0$-norm distortion. Experimental results on three benchmark datasets show that under the same query budget, the adversarial examples generated by our method have lower $L_0$-norm and $L_2$-norm distortion than the state-of-the-art. Especially for the MNIST dataset, our attack reduces 8.1$\%$ $L_2$-norm distortion meanwhile remaining 47$\%$ pixels unattacked. This demonstrates the superiority of the proposed method over its competitors in terms of adversarial robustness and visual imperceptibility.
翻译:对抗样本已展现出使训练有素模型误分类的强大能力。当前主流的对抗攻击方法仅考虑$L_0$范数、$L_2$范数和$L_\infty$范数中的一种失真度量。基于$L_0$范数的方法会导致单像素的大幅修改,产生肉眼可见的检测痕迹;而基于$L_2$范数和$L_\infty$范数的方法由于总是将微小扰动扩散到所有像素,在面对对抗防御时表现出较弱的鲁棒性。更符合现实场景的对抗扰动应同时具备稀疏性与不可感知性。本文提出一种新颖的$L_p$范数失真高效对抗攻击方法,该方法不仅具有最小的$L_2$范数损失,还能显著降低$L_0$范数失真。为实现这一目标,我们设计了一种新的优化方案:首先在$L_2$范数约束下优化初始对抗扰动,随后为初始扰动构建维度重要性矩阵。该矩阵能够表征初始扰动各维度的对抗不重要性。进一步地,我们为维度重要性矩阵引入了对抗阈值的新概念。将不重要性高于阈值的初始扰动维度全部置零,可大幅降低$L_0$范数失真。在三个基准数据集上的实验结果表明:在相同查询预算下,本方法生成的对抗样本比现有最优方法具有更低的$L_0$范数与$L_2$范数失真。特别是在MNIST数据集上,我们的攻击在保持47$\%$像素未被修改的同时,将$L_2$范数失真降低了8.1$\%$。这证明了所提方法在对抗鲁棒性与视觉不可感知性方面优于现有方法。