Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.
翻译:校准概念形式化地描述了概率预测与相应结果之间的相容性。简而言之,结果应无法与从预测分布中随机抽取的样本相区分。本文回顾、扩展并衔接了分类与回归任务中已提出的校准概念,特别关注不同概念之间的层次关系,这些概念适用于一般实值数据、连续结果、计数数据、名义类别和二元结果。为突出若干贡献,我们引入名义结果的模态校准概念,在此框架下区分完全校准、部分校准与平均校准,并证明双概率积分变换(PIT)校准与先前提出的离散结果校准概念在逻辑上相互独立。此外,我们推广了基于预测分布性质或泛函(如均值、分位数或事件概率)表达的现有校准概念结果。本文通过实例阐明各概念及其层次关系,并提供支持构造启发性案例与反例的算法工具。