Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}.
翻译:对抗攻击精心生成微小的、难以察觉的扰动来欺骗神经网络。为应对此类攻击,对抗净化方法旨在将对抗性输入样本转换为干净的输出图像,从而抵御对抗攻击。然而,现有的生成模型未能有效消除对抗扰动,导致净化效果不理想。我们强调残留对抗扰动对目标模型的潜在威胁,定量建立了扰动幅度与攻击能力之间的关系。值得注意的是,净化图像上的残留扰动主要来源于对抗样本的同位置补丁和相似补丁。我们提出了一种名为信息掩码净化(IMPure)的新型对抗净化方法,旨在全面消除对抗扰动。为获得对抗样本,我们首先掩码部分补丁信息,然后重建补丁以抵抗补丁层面的对抗扰动。我们并行重建所有补丁以获得连贯的图像。随后,为保护净化样本免受潜在相似区域扰动的影响,我们将净化样本与输入样本随机混合后输入特征提取网络来模拟此风险。最后,我们建立像素损失与感知损失的联合约束,以增强模型的重建适应性。在ImageNet数据集上使用三种分类器模型进行的大量实验表明,我们的方法在应对九种对抗攻击方法时取得了最先进的结果。实现代码和预训练权重可在 \textcolor{blue}{https://github.com/NoWindButRain/IMPure} 获取。