In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier accuracy. While prior works have delved into the implications of dataset shift on calibration, existing CE estimators assume access to labels from the target domain, which are often unavailable in practice, i.e., when the model is deployed and used. This work addresses such challenging scenario, and proposes a novel CE estimator under label shift, which is characterized by changes in the marginal label distribution $p(Y)$, while keeping the conditional $p(X|Y)$ constant between the source and target distributions. Our contribution is an approach, which, by leveraging importance re-weighting of the labeled source distribution, provides consistent and asymptotically unbiased CE estimation with respect to the shifted target distribution. Empirical results across diverse real-world datasets, under various conditions and label-shift intensities, demonstrate the effectiveness and reliability of the proposed estimator.
翻译:在数据集偏移的背景下,模型校准对确保机器学习系统的可靠性起着关键作用。校准误差(CE)是衡量预测概率与分类器准确率一致性的指标。尽管已有研究探讨了数据集偏移对校准的影响,但现有的CE估计器均假设能获取目标域标签——这在模型部署和实际应用中往往不可行。本文针对这一具有挑战性的场景,提出了一种标签偏移下的新型CE估计器。该偏移的特征在于边缘标签分布$p(Y)$发生变化,而源分布与目标分布之间的条件分布$p(X|Y)$保持不变。我们的贡献在于:通过利用标记源分布的重要性重加权方法,该估计器能针对偏移后目标分布提供一致且渐近无偏的CE估计。在多种实际数据集、不同条件及标签偏移强度下的实验结果表明,所提估计器具有有效性和可靠性。