The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive decision-making domains, such as finance or healthcare. Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation. In our study, we analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Calibration Score. Comparing recalibration methods, we advocate for local regressions, emphasizing their dual role as effective recalibration tools and facilitators of smoother visualizations. We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration during performance optimization.
翻译:传统上,二分类器性能评估主要侧重于基于准确率等指标的判别能力。然而,这些指标往往忽略了模型固有的不确定性,尤其是在处理金融或医疗等敏感决策领域时。由于模型预测分数通常被视为事件概率,校准对于准确解读至关重要。在本研究中,我们分析了不同校准度量对分数扰动的敏感性,并提出了一种改进指标——局部校准分数。通过比较重校准方法,我们主张使用局部回归,强调其作为有效重校准工具和促进可视化平滑性的双重作用。我们在一项真实场景中应用了这些发现,使用随机森林分类器和回归器预测信用违约,同时在性能优化过程中同步测量校准效果。