The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted great attention in the machine learning community. The problem is related to non-flatness and non-smoothness of normally obtained loss landscapes. Training augmented with adversarial examples (a.k.a., adversarial training) is considered as an effective remedy. In this paper, we highlight that some collaborative examples, nearly perceptually indistinguishable from both adversarial and benign examples yet show extremely lower prediction loss, can be utilized to enhance adversarial training. A novel method is therefore proposed to achieve new state-of-the-arts in adversarial robustness. Code: https://github.com/qizhangli/ST-AT.
翻译:深度神经网络(DNN)对对抗样本的脆弱性已在机器学习领域引起广泛关注。这一问题与常规训练所得损失曲面的非平坦性和非光滑性密切相关。通过对抗样本增强的训练(亦称对抗训练)被视为有效的应对手段。本文指出,部分协同样本(在感知层面与对抗样本及良性样本几乎难以区分,却表现出极低的预测损失)可被用于强化对抗训练。基于此,我们提出了一种新方法,在对抗鲁棒性方面取得了当前最优水平。代码:https://github.com/qizhangli/ST-AT。