Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.
翻译:考虑一个多类标记问题,其中标签取值于 $[k]$ 范围内,预测器输出一个关于标签的概率分布。本文研究以下基础性问题:是否存在既能对有意义预测提供强有力保证,又能在时间和样本复杂度上关于 $k$ 多项式可达到的多类校准概念?现有校准概念在计算效率与表达能力之间存在权衡:要么样本复杂度关于 $k$ 呈指数增长,要么需要解决计算上难以处理的问题,要么仅能提供较弱的保证。我们的主要贡献在于提出一种满足所有上述需求的校准概念:针对多类预测,我们形式化了投影平滑校准的鲁棒概念,并给出了在该定义下高效校准预测器的新重校准算法,其复杂度关于 $k$ 呈多项式增长。投影平滑校准为所有下游决策者提供强有力保证,这些决策者希望将预测器用于如下形式的二元分类问题:标签是否属于子集 $T \subseteq [k]$?例如,这张图片是否为动物图像?它确保通过累加 $T$ 中标签分配的概率所得到的预测概率,接近于针对该任务完全校准的二元预测器。我们还证明,该定义的自然加强版本在计算上难以实现:它们会遭遇信息论障碍或计算困难。支撑我们上下界结论的关键,在于我们证明了多类校准与标准二元预测设定中已被充分研究的不可知学习问题之间存在紧密关联。