Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
翻译:深度神经网络(DNN)在许多分类应用中展现出巨大潜力,但众所周知,当模型过参数化时,其预测结果往往缺乏良好的校准。在医疗保健等安全关键型应用中,提升深度神经网络的校准性能同时不牺牲模型精度极其重要且备受关注。本研究表明,在过参数化的深度神经网络架构(如宽残差网络(WRN)和视觉变换器(ViT))中,将特征提取层与分类层的训练过程解耦,能够显著改善模型校准效果,同时保持准确率并降低训练成本。此外,我们提出在深度神经网络最后一个隐藏层输出上施加高斯先验,并在分类训练阶段采用变分方法训练模型,可进一步改善校准性能。我们通过多个图像分类基准数据集验证了这些方法在ViT和WRN架构中的校准改进效果。