Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
翻译:深度神经网络(DNN)在许多分类应用中表现出巨大潜力,但广为人知的是,当模型过度参数化时,其预测校准性较差。在不影响模型准确性的前提下改善DNN校准性,对于医疗保健等安全关键型应用具有极其重要的意义。本文表明,在宽残差网络(WRN)和视觉Transformer(ViT)等过度参数化的DNN架构中,解耦特征提取层与分类层的训练,能够在保持准确性的同时显著改善模型校准性,且训练成本较低。此外,本文还证明,对DNN最后一个隐藏层的输出施加高斯先验,并在分类训练阶段以变分方式训练模型,能够进一步改善校准性。我们通过多个图像分类基准数据集验证了这些方法在ViT和WRN架构上提升校准性的有效性。