Neural networks solving real-world problems are often required not only to make accurate predictions but also to provide a confidence level in the forecast. The calibration of a model indicates how close the estimated confidence is to the true probability. This paper presents a survey of confidence calibration problems in the context of neural networks and provides an empirical comparison of calibration methods. We analyze problem statement, calibration definitions, and different approaches to evaluation: visualizations and scalar measures that estimate whether the model is well-calibrated. We review modern calibration techniques: based on post-processing or requiring changes in training. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
翻译:神经网络在解决实际问题时,不仅需要做出准确预测,还需提供预测的置信度水平。模型校准反映了估计置信度与真实概率之间的接近程度。本文综述了神经网络背景下的置信度校准问题,并对校准方法进行了实证比较。我们分析了问题表述、校准定义以及不同评估方法:用于判断模型是否校准良好的可视化方法与标量度量。我们回顾了现代校准技术:基于后处理或需要改变训练过程的方法。实证实验涵盖多种数据集和模型,依据不同标准对校准方法进行了比较。