This study introduces novel superior scoring rules called Penalized Brier Score (PBS) and Penalized Logarithmic Loss (PLL) to improve model evaluation for probabilistic classification. Traditional scoring rules like Brier Score and Logarithmic Loss sometimes assign better scores to misclassifications in comparison with correct classifications. This discrepancy from the actual preference for rewarding correct classifications can lead to suboptimal model selection. By integrating penalties for misclassifications, PBS and PLL modify traditional proper scoring rules to consistently assign better scores to correct predictions. Formal proofs demonstrate that PBS and PLL satisfy strictly proper scoring rule properties while also preferentially rewarding accurate classifications. Experiments showcase the benefits of using PBS and PLL for model selection, model checkpointing, and early stopping. PBS exhibits a higher negative correlation with the F1 score compared to the Brier Score during training. Thus, PBS more effectively identifies optimal checkpoints and early stopping points, leading to improved F1 scores. Comparative analysis verifies models selected by PBS and PLL achieve superior F1 scores. Therefore, PBS and PLL address the gap between uncertainty quantification and accuracy maximization by encapsulating both proper scoring principles and explicit preference for true classifications. The proposed metrics can enhance model evaluation and selection for reliable probabilistic classification.
翻译:本研究引入了称为惩罚型Brier分数(PBS)和惩罚型对数损失(PLL)的新型优越评分规则,以改进概率分类的模型评估。传统评分规则如Brier分数和对数损失有时会为错误分类分配比正确分类更好的分数。这种与实际偏好奖励正确分类的偏差可能导致次优的模型选择。通过整合对错误分类的惩罚,PBS和PLL修改了传统的严格适当评分规则,以始终为正确预测分配更好的分数。形式化证明表明,PBS和PLL满足严格适当评分规则的性质,同时优先奖励准确分类。实验展示了使用PBS和PLL进行模型选择、模型检查点设置和早停的益处。在训练过程中,与Brier分数相比,PBS与F1分数表现出更高的负相关性。因此,PBS更有效地识别最佳检查点和早停点,从而提高F1分数。比较分析验证了由PBS和PLL选择的模型实现了更优的F1分数。因此,PBS和PLL通过封装严格适当评分原则和对真实分类的明确偏好,解决了不确定性量化与准确性最大化之间的差距。所提出的指标可以增强可靠概率分类的模型评估和选择。