The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model. However, as the expected calibration error (ECE), which quantifies the explanatory power of model inference, was recently introduced, research on training models that can be explained is in progress. We hypothesized that a gap in supervision criteria during training and inference leads to overconfidence, and investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training. To verify this assumption, we used a simple LDL setting with recent data augmentation techniques. Based on a series of experiments, the following results are obtained: 1) State-of-the-art KD methods significantly impede model calibration. 2) Training using LDL with recent data augmentation can have excellent effects on model calibration and even in generalization performance. 3) Online LDL brings additional improvements in model calibration and accuracy with long training, especially in large-size models. Using the proposed approach, we simultaneously achieved a lower ECE and higher generalization performance for the image classification datasets CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
翻译:早期卷积神经网络(CNN)训练的首要目标是获得更高的模型泛化性能。然而,随着衡量模型推理解释力的期望校准误差(ECE)概念的提出,可解释模型训练的研究正逐步推进。我们假设训练与推理阶段监督标准的不一致会导致过置信问题,并探究了在CNN训练中引入标签分布学习(LDL)方法增强模型校准能力的可能性。为验证这一假设,我们采用简单的LDL框架结合最新数据增强技术。通过系列实验获得以下发现:1)当前最优的知识蒸馏方法显著损害模型校准效果;2)结合最新数据增强的LDL训练能显著提升模型校准性能,甚至改善泛化能力;3)在线式LDL在长时间训练场景下可进一步提升校准效果与精度,尤其适用于大规模模型。基于所提方法,我们在CIFAR10、CIFAR100、STL10及ImageNet图像分类数据集上同时实现了更低的ECE与更强的泛化性能。通过开展多项可视化与量化分析,我们观察到了LDL训练过程中CNN表现出的若干有趣特性。