Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.
翻译:考虑一个多类标注问题,其中标签取值于$[k]$,预测器输出标签上的一个概率分布。本文研究以下基础性问题:是否存在一类多类校准概念,既能提供有意义的预测保证,又能在时间和样本复杂度上实现$k$的多项式级别?现有校准概念在计算效率与表达能力之间存在权衡:它们要么需要$k$的指数级样本复杂度,要么需解决计算难解问题,要么仅能提供较弱的保证。我们的主要贡献是提出一种同时满足所有需求的校准概念:我们为多类预测构建了鲁棒的投影平滑校准概念,并给出在此定义下高效校准预测器的新算法,其复杂度为$k$的多项式。投影平滑校准为所有下游决策者提供了强保证——这些决策者希望将预测器用于形式为“标签是否属于子集$T \subseteq [k]$”的二分类问题(例如:这是否为动物图像?)。该概念确保通过对$T$中标签的预测概率求和所得的概率,与针对该任务的某个完美校准二分类预测器的输出概率相近。我们还证明,对本定义的自然强化在计算上是难以实现的:它们会遭遇信息论障碍或计算不可行性问题。我们上界与下界证明的共同基础,是所建立的多类校准与(标准)二分类设定中充分研究的不可知学习问题之间的紧密联系。