Adversarial attacks pose a significant threat to the security and safety of deep neural networks being applied to modern applications. More specifically, in computer vision-based tasks, experts can use the knowledge of model architecture to create adversarial samples imperceptible to the human eye. These attacks can lead to security problems in popular applications such as self-driving cars, face recognition, etc. Hence, building networks which are robust to such attacks is highly desirable and essential. Among the various methods present in literature, defensive distillation has shown promise in recent years. Using knowledge distillation, researchers have been able to create models robust against some of those attacks. However, more attacks have been developed exposing weakness in defensive distillation. In this project, we derive inspiration from teacher assistant knowledge distillation and propose that introducing an assistant network can improve the robustness of the distilled model. Through a series of experiments, we evaluate the distilled models for different distillation temperatures in terms of accuracy, sensitivity, and robustness. Our experiments demonstrate that the proposed hypothesis can improve robustness in most cases. Additionally, we show that multi-step distillation can further improve robustness with very little impact on model accuracy.
翻译:对抗性攻击对应用于现代应用程序的深度神经网络的安全性和可靠性构成了重大威胁。具体而言,在基于计算机视觉的任务中,专家可借助模型架构知识制造人眼无法察觉的对抗样本。此类攻击可能导致自动驾驶、人脸识别等热门应用中的安全问题。因此,构建对此类攻击具有鲁棒性的网络至关重要且必不可少。在文献中现有的多种方法中,防御性知识蒸馏近年来展现出发展潜力。研究人员通过知识蒸馏构建了能够抵御部分这类攻击的模型。然而,后续开发的更多攻击手段暴露了防御性知识蒸馏的弱点。在本项目中,我们从教师助手知识蒸馏中获得启发,提出引入辅助网络可提升蒸馏模型的鲁棒性。通过一系列实验,我们从准确率、灵敏度和鲁棒性三个维度评估了不同蒸馏温度下的蒸馏模型。实验表明,所提出的假设在大多数情况下能够改善鲁棒性。此外,我们证明多步蒸馏可在几乎不影响模型准确率的前提下进一步提升鲁棒性。