The vulnerability of Deep Neural Networks (DNNs) to adversarial examples has been confirmed. Existing adversarial defenses primarily aim at preventing adversarial examples from attacking DNNs successfully, rather than preventing their generation. If the generation of adversarial examples is unregulated, images within reach are no longer secure and pose a threat to non-robust DNNs. Although gradient obfuscation attempts to address this issue, it has been shown to be circumventable. Therefore, we propose a novel adversarial defense mechanism, which is referred to as immune defense and is the example-based pre-defense. This mechanism applies carefully designed quasi-imperceptible perturbations to the raw images to prevent the generation of adversarial examples for the raw images, and thereby protecting both images and DNNs. These perturbed images are referred to as Immune Examples (IEs). In the white-box immune defense, we provide a gradient-based and an optimization-based approach, respectively. Additionally, the more complex black-box immune defense is taken into consideration. We propose Masked Gradient Sign Descent (MGSD) to reduce approximation error and stabilize the update to improve the transferability of IEs and thereby ensure their effectiveness against black-box adversarial attacks. The experimental results demonstrate that the optimization-based approach has superior performance and better visual quality in white-box immune defense. In contrast, the gradient-based approach has stronger transferability and the proposed MGSD significantly improve the transferability of baselines.
翻译:深度神经网络(DNNs)对对抗样本的脆弱性已得到证实。现有对抗防御主要旨在防止对抗样本成功攻击DNN,而非阻止其生成。若对抗样本的生成不受约束,可及范围内的图像将不再安全,并对非鲁棒DNN构成威胁。尽管梯度混淆试图解决此问题,但已被证明可被绕过。因此,我们提出一种新型对抗防御机制,称为免疫防御,它是一种基于示例的预防御。该机制对原始图像施加精心设计的准不可感知扰动,以阻止为原始图像生成对抗样本,从而同时保护图像与DNN。这些被扰动的图像称为免疫示例(IEs)。在白盒免疫防御中,我们分别提供基于梯度和基于优化的方法。此外,还考虑了更复杂的黑盒免疫防御。我们提出掩码梯度符号下降(MGSD)以降低近似误差并稳定更新,从而提高IEs的可迁移性,进而确保其对抗黑盒攻击的有效性。实验结果表明,在白盒免疫防御中,基于优化的方法具有更优性能与更好视觉质量;而基于梯度的方法具有更强可迁移性,且所提MGSD显著提升了基准方法的可迁移性。