Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In particular, we focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples? These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting. To address these questions, we introduce a faithful imitation framework to discuss the relative calibration of confidences and provide empirical and certified methods to evaluate the relative calibration of a student w.r.t. its teacher. Further, to verifiably align the relative calibration incentives of the student to those of its teacher, we introduce faithful distillation. Our experiments on the MNIST, Fashion-MNIST and CIFAR-10 datasets demonstrate the need for such an analysis and the advantages of the increased verifiability of faithful distillation over alternative adversarial distillation methods.
翻译:知识蒸馏(KD)因其在压缩网络以实现资源受限系统部署方面的成功而受到广泛关注。尽管在KD设置中已研究过对抗鲁棒性问题,但先前的工作忽视了学生网络相对于教师网络在软置信度方面的相对校准(我们称之为相对校准)。具体而言,我们关注教师-学生对的两个关键问题:(i)在接近正确分类的数据集样本处,教师和学生是否产生分歧;(ii)在数据集样本周围,蒸馏后的学生是否与教师具有同等置信度?在安全关键场景中部署从鲁棒教师训练得到的较小学生网络时,这些问题至关重要。为解决这些问题,我们引入了一个忠实模仿框架来讨论置信度的相对校准,并提供了经验性和可验证的方法来评估学生相对于其教师的相对校准。此外,为了可验证地将学生的相对校准激励与其教师对齐,我们引入了忠实蒸馏。我们在MNIST、Fashion-MNIST和CIFAR-10数据集上的实验证明了这种分析的必要性,以及忠实蒸馏相较于替代对抗蒸馏方法在可验证性增强方面的优势。