Adversarial attacks pose a challenge to the deployment of deep neural networks (DNNs), while previous defense models overlook the generalization to various attacks. Inspired by targeted therapies for cancer, we view adversarial samples as local lesions of natural benign samples, because a key finding is that salient attack in an adversarial sample dominates the attacking process, while trivial attack unexpectedly provides trustworthy evidence for obtaining generalizable robustness. Based on this finding, a Pixel Surgery and Semantic Regeneration (PSSR) model following the targeted therapy mechanism is developed, which has three merits: 1) To remove the salient attack, a score-based Pixel Surgery module is proposed, which retains the trivial attack as a kind of invariance information. 2) To restore the discriminative content, a Semantic Regeneration module based on a conditional alignment extrapolator is proposed, which achieves pixel and semantic consistency. 3) To further harmonize robustness and accuracy, an intractable problem, a self-augmentation regularizer with adversarial R-drop is designed. Experiments on numerous benchmarks show the superiority of PSSR.
翻译:对抗性攻击对深度神经网络(DNNs)的部署构成了挑战,而以往的防御模型忽视了对各种攻击的泛化能力。受癌症靶向治疗的启发,我们将对抗样本视为自然良性样本的局部病灶,因为一个关键发现是:对抗样本中的显著攻击主导了攻击过程,而轻微攻击反而意外地提供了获得可泛化鲁棒性的可信证据。基于这一发现,我们开发了遵循靶向治疗机制的像素手术与语义再生(PSSR)模型,该模型具有三个优点:1)为移除显著攻击,提出了基于评分的像素手术模块,该模块将轻微攻击保留为一种不变性信息。2)为恢复判别性内容,提出了基于条件对齐外推器的语义再生模块,实现了像素和语义一致性。3)为进一步协调鲁棒性与准确性这一棘手问题,设计了结合对抗性R-drop的自增强正则化器。在多个基准数据集上的实验表明,PSSR具有优越性。