Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing the need for margin-based priors. Experiments on MNIST, CIFAR10, Fashion-MNIST, CIFAR100 and Stanford Dogs datasets support the effectiveness of the proposed method in generating adversarial attacks in the latent space while ensuring a high degree of visual realism with respect to pixel-based adversarial attack methods.
翻译:对抗攻击在输入(像素)空间中通常通过引入噪声边界(如$L_1$范数或$L_{\infty}$范数),以生成迷惑深度神经网络的微扰数据。此类噪声边界限制了允许噪声的幅度。本研究提出利用生成对抗网络在潜在(特征)空间中注入对抗扰动,从而消除基于边界的先验需求。在MNIST、CIFAR10、Fashion-MNIST、CIFAR100及Stanford Dogs数据集上的实验表明,该方法在潜在空间中生成对抗攻击时具有有效性,同时相较于基于像素的对抗攻击方法,能确保较高的视觉真实性。