For classification models based on neural networks, the maximum predicted class probability is often used as a confidence score. This score rarely predicts well the probability of making a correct prediction and requires a post-processing calibration step. However, many confidence calibration methods fail for problems with many classes. To address this issue, we transform the problem of calibrating a multiclass classifier into calibrating a single surrogate binary classifier. This approach allows for more efficient use of standard calibration methods. We evaluate our approach on numerous neural networks used for image or text classification and show that it significantly enhances existing calibration methods.
翻译:对于基于神经网络的分类模型,最大预测类别概率常被用作置信度评分。该评分很少能准确预测正确分类的概率,因此需要后处理校准步骤。然而,许多置信度校准方法在处理多类别问题时效果不佳。为解决此问题,我们将多类别分类器的校准问题转化为对单一代理二分类器的校准。该方法能够更有效地利用标准校准方法。我们在多个用于图像或文本分类的神经网络上评估了所提出的方法,结果表明该方法显著提升了现有校准方法的性能。