JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification Tasks

We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncertainty. Among the most popular and widely used calibration methods in classification are temperature scaling (i.e., pool-then-calibrate) and conformal methods. However, the main shortcoming of these calibration methods is that they do not balance the proportion of aleatoric and epistemic uncertainty. Not balancing these uncertainties can severely misrepresent predictive uncertainty, leading to overconfident predictions in some input regions while being underconfident in others. To address this shortcoming, we present a simple but powerful calibration algorithm Joint Uncertainty Calibration (JUCAL) that jointly calibrates aleatoric and epistemic uncertainty. JUCAL jointly calibrates two constants to weight and scale epistemic and aleatoric uncertainties by optimizing the negative log-likelihood (NLL) on the validation/calibration dataset. JUCAL can be applied to any trained ensemble of classifiers (e.g., transformers, CNNs, or tree-based methods), with minimal computational overhead, without requiring access to the models' internal parameters. We experimentally evaluate JUCAL on various text classification tasks, for ensembles of varying sizes and with different ensembling strategies. Our experiments show that JUCAL significantly outperforms SOTA calibration methods across all considered classification tasks, reducing NLL and predictive set size by up to 15% and 20%, respectively. Interestingly, even applying JUCAL to an ensemble of size 5 can outperform temperature-scaled ensembles of size up to 50 in terms of NLL and predictive set size, resulting in up to 10 times smaller inference costs. Thus, we propose JUCAL as a new go-to method for calibrating ensembles in classification.

翻译：本研究探讨训练好的分类器集成模型的后验校准不确定性，特别关注偶然性（标签噪声）和认知性（模型）两类不确定性。分类任务中最常用且广泛应用的校准方法包括温度缩放（即先池化后校准）和保形预测方法。然而，这些校准方法的主要缺陷在于未能平衡偶然性与认知性不确定性的比例。若不对这两种不确定性进行平衡，将严重扭曲预测不确定性的表征，导致某些输入区域预测过度自信，而其他区域则预测信心不足。为克服这一缺陷，我们提出了一种简洁而高效的联合不确定性校准算法（JUCAL），可对偶然性与认知性不确定性进行协同校准。JUCAL通过优化验证/校准数据集的负对数似然，联合校准两个常数以加权和缩放认知性与偶然性不确定性。该算法适用于任何已训练的分类器集成模型（如Transformer、CNN或基于树的方法），计算开销极小，且无需访问模型内部参数。我们在多种文本分类任务上对JUCAL进行实验评估，测试了不同规模集成及多种集成策略。实验结果表明，在所有考察的分类任务中，JUCAL均显著优于当前最先进的校准方法，将负对数似然和预测集大小分别降低达15%和20%。值得注意的是，即使对规模仅为5的集成模型应用JUCAL，其在负对数似然和预测集大小指标上也能超越规模达50的温度缩放集成模型，推理成本最高可降低10倍。因此，我们推荐JUCAL作为分类任务中集成模型校准的新基准方法。