Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at https://github.com/hee-suk-yoon/ESD.
翻译:研究表明,现代神经网络因过度自信的预测往往存在标定不良的问题。传统上,后处理方法用于在训练后对模型进行标定。近年来,各类可训练标定度量被提出,可直接将其融入训练过程。然而,这些方法均包含内部超参数,其标定目标的性能依赖于这些超参数的调节,随着神经网络和数据集规模的增大,这会带来更高的计算成本。为此,我们提出了期望平方差(Expected Squared Difference,ESD)——一种免调参(即无需超参数)的可训练标定目标损失函数,从两个期望之间平方差的视角来审视标定误差。通过在多种架构(CNN、Transformer)和数据集上进行大量实验,我们证明:(1)将ESD纳入训练可在多种批大小设置下改善模型标定,且无需调节内部超参数;(2)与先前方法相比,ESD可获得最佳标定结果;(3)由于无内部超参数,ESD显著降低了训练过程中标定所需的计算成本。代码公开于https://github.com/hee-suk-yoon/ESD。