Neural networks trained on distilled data often produce over-confident output and require correction by calibration methods. Existing calibration methods such as temperature scaling and mixup work well for networks trained on original large-scale data. However, we find that these methods fail to calibrate networks trained on data distilled from large source datasets. In this paper, we show that distilled data lead to networks that are not calibratable due to (i) a more concentrated distribution of the maximum logits and (ii) the loss of information that is semantically meaningful but unrelated to classification tasks. To address this problem, we propose Masked Temperature Scaling (MTS) and Masked Distillation Training (MDT) which mitigate the limitations of distilled data and achieve better calibration results while maintaining the efficiency of dataset distillation.
翻译:基于蒸馏数据训练的神经网络往往输出过度自信的结果,需要通过校准方法进行修正。现有的温度缩放与混合训练等校准方法虽适用于原始大规模数据训练的网络,但我们发现这些方法无法有效校准从大型源数据集蒸馏得到的数据所训练的网络。本文证明蒸馏数据会导致网络不可校准,其原因是:(i) 最大logits分布更为集中;(ii) 损失了与分类任务无关但具有语义意义的信息。为解决该问题,我们提出掩码温度缩放(MTS)和掩码蒸馏训练(MDT)方法,在保持数据集蒸馏效率的同时,缓解蒸馏数据的局限性并实现更优的校准效果。