Adversarial attack is commonly regarded as a huge threat to neural networks because of misleading behavior. This paper presents an opposite perspective: adversarial attacks can be harnessed to improve neural models if amended correctly. Unlike traditional adversarial defense or adversarial training schemes that aim to improve the adversarial robustness, the proposed adversarial amendment (AdvAmd) method aims to improve the original accuracy level of neural models on benign samples. We thoroughly analyze the distribution mismatch between the benign and adversarial samples. This distribution mismatch and the mutual learning mechanism with the same learning ratio applied in prior art defense strategies is the main cause leading the accuracy degradation for benign samples. The proposed AdvAmd is demonstrated to steadily heal the accuracy degradation and even leads to a certain accuracy boost of common neural models on benign classification, object detection, and segmentation tasks. The efficacy of the AdvAmd is contributed by three key components: mediate samples (to reduce the influence of distribution mismatch with a fine-grained amendment), auxiliary batch norm (to solve the mutual learning mechanism and the smoother judgment surface), and AdvAmd loss (to adjust the learning ratios according to different attack vulnerabilities) through quantitative and ablation experiments.
翻译:对抗性攻击通常被视为对神经网络的巨大威胁,因其会导致误导性行为。本文提出了一种相反的视角:若修正得当,对抗性攻击可用于改进神经模型。与旨在提升对抗鲁棒性的传统对抗防御或对抗训练方案不同,所提出的对抗性修正(AdvAmd)方法旨在提升神经模型在良性样本上的原始准确率。我们深入分析了良性样本与对抗样本之间的分布失配现象。这种分布失配以及现有防御策略中采用相同学习率的互学习机制,是导致良性样本准确率下降的主要原因。实验表明,所提出的AdvAmd方法能够稳定地修复准确率下降问题,甚至能在良性分类、目标检测和分割任务中使常见神经模型的准确率获得一定提升。通过定量与消融实验,AdvAmd的有效性归因于三个关键组件:中介样本(通过细粒度修正降低分布失配影响)、辅助批归一化(解决互学习机制和更平滑的决策面问题)以及AdvAmd损失(根据不同的攻击脆弱性调整学习率)。