Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. In this situation, adversarial distillation is a promising option which aims to distill the robustness of the teacher network to improve the robustness of a small student network. Previous works pretrain the teacher network to make it robust to the adversarial examples aimed at itself. However, the adversarial examples are dependent on the parameters of the target network. The fixed teacher network inevitably degrades its robustness against the unseen transferred adversarial examples which targets the parameters of the student network in the adversarial distillation process. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself. PeerAiD is an adversarial distillation that trains the peer network and the student network simultaneously in order to make the peer network specialized for defending the student network. We observe that such peer networks surpass the robustness of pretrained robust teacher network against student-attacked adversarial samples. With this peer network and adversarial distillation, PeerAiD achieves significantly higher robustness of the student network with AutoAttack (AA) accuracy up to 1.66%p and improves the natural accuracy of the student network up to 4.72%p with ResNet-18 and TinyImageNet dataset.
翻译:神经网络的对抗鲁棒性在其应用于安全关键领域时是一个重要问题。在此背景下,对抗性蒸馏是一种有前景的方法,旨在从教师网络蒸馏鲁棒性以提升小型学生网络的鲁棒性。先前工作通过预训练教师网络使其能抵御针对自身的对抗样本。然而,对抗样本依赖于目标网络的参数。在对抗性蒸馏过程中,固定的教师网络必然降低其对针对学生网络参数的未见迁移对抗样本的鲁棒性。我们提出PeerAiD,使同伴网络学习针对学生网络的对抗样本,而非针对自身。PeerAiD是一种对抗性蒸馏方法,同步训练同伴网络与学生网络,使同伴网络专门用于防御学生网络。我们观察到,此类同伴网络在学生攻击的对抗样本上的鲁棒性超过了预训练鲁棒教师网络。借助该同伴网络与对抗性蒸馏,PeerAiD在学生网络上实现了显著更高的鲁棒性:AutoAttack(AA)准确率提升高达1.66个百分点,并以ResNet-18和TinyImageNet数据集将学生网络的自然准确率提升高达4.72个百分点。