Recently, it has been widely known that deep neural networks are highly vulnerable and easily broken by adversarial attacks. To mitigate the adversarial vulnerability, many defense algorithms have been proposed. Recently, to improve adversarial robustness, many works try to enhance feature representation by imposing more direct supervision on the discriminative feature. However, existing approaches lack an understanding of learning adversarially robust feature representation. In this paper, we propose a novel training framework called Robust Proxy Learning. In the proposed method, the model explicitly learns robust feature representations with robust proxies. To this end, firstly, we demonstrate that we can generate class-representative robust features by adding class-wise robust perturbations. Then, we use the class representative features as robust proxies. With the class-wise robust features, the model explicitly learns adversarially robust features through the proposed robust proxy learning framework. Through extensive experiments, we verify that we can manually generate robust features, and our proposed learning framework could increase the robustness of the DNNs.
翻译:近年来,深度神经网络被广泛认为极易受到对抗攻击的破坏。为缓解对抗脆弱性问题,已有多种防御算法被提出。近期,为了提升对抗鲁棒性,诸多研究通过加强对判别特征的直接监督来改善特征表征。然而,现有方法对如何学习对抗鲁棒特征表征缺乏深入理解。本文提出一种名为"鲁棒代理学习"的新型训练框架。在该方法中,模型通过鲁棒代理显式学习鲁棒特征表征。为此,我们首先证明可以通过添加类级鲁棒扰动生成具有类别代表性的鲁棒特征,进而将类代表性特征作为鲁棒代理。借助类级鲁棒特征,模型通过所提出的鲁棒代理学习框架显式学习对抗鲁棒特征。通过大量实验,我们验证了可手动生成鲁棒特征,且所提出的学习框架能够有效提升深度神经网络的鲁棒性。