Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at <https://github.com/changzhang777/ANCRA>.
翻译:深度神经网络容易受到对抗噪声的干扰。对抗训练(AT)已被证明是保护神经网络免受欺骗的最有效防御策略。然而,我们发现AT忽略了鲁棒特征的学习,导致对抗鲁棒性表现不佳。为解决这一问题,我们强调了鲁棒表示的两个特征:(1) $\bf{排斥性}$:自然样本的特征远离其他类别的特征;(2) $\bf{对齐性}$:自然样本与对应对抗样本的特征彼此接近。这激励我们提出一个通用的AT框架,通过非对称负对比和反向注意力来获得鲁棒表示。具体而言,我们基于预测概率设计了一种非对称负对比,以在特征空间中推远不同类别的样本。此外,我们提出利用线性分类器的参数作为反向注意力对特征进行加权,从而获得类别感知特征并拉近同一类别的特征。在三个基准数据集上的实验评估表明,我们的方法显著提升了AT的鲁棒性并实现了最先进的性能。代码可在<https://github.com/changzhang777/ANCRA>获取。