Smoothed Elicitation Complexity for Approximate $Γ$-calibration of Discrete Classification Tasks

One prominent method of evaluating machine learning model trustworthiness is the notion of calibration. In the binary outcome setting, a probabilistic predictor is calibrated if outcomes are realized according to a model's distributional prediction, conditioned on this prediction. Straightforward extensions of binary calibration definitions to probabilistic multiclass classifiers suffer from an exponential complexity blowup as the space of predictions grows exponentially in the number of classes $n$. As a remedy, Noarov and Roth (2023) propose multiclass calibration with predictions that are properties of the outcome distribution, reducing complexity from growing in the number of classes $n$ to the dimension $d$ of the property, called its elicitation complexity. Previous work on approximate property calibration is generally limited to continuous scalar properties, despite many relevant properties of interest being discrete, like the mode or rankings. We characterize the approximate property calibration of discrete properties which are strongly orderable by using Lipschitz continuous properties as an intermediary. This work is the first to our knowledge to provide approximate calibration results for discrete properties. Along the way, we characterize the Lipschitz elicitation complexity of strongly orderable discrete properties by constructing algorithms for designing these Lipschitz properties, which we prove can be post-processed to obtain the original discrete property.

翻译：评估机器学习模型可信度的一个常见方法是校准。在二元结果设定下，若结果根据模型分布预测实现，且该预测以条件为前提，则概率预测器是校准的。将二元校准定义直接扩展至概率多类分类器时，由于预测空间随类别数$n$呈指数增长，会导致指数级复杂度爆炸。为解决此问题，Noarov与Roth（2023）提出利用结果分布属性进行多类校准，将复杂度从随类别数$n$增长降低至属性维度$d$（即其引述复杂度）。先前关于近似属性校准的研究通常局限于连续标量属性，而许多相关属性（如众数或排序）本质上是离散的。本文通过以利普希茨连续属性为中介，刻画了强可排序离散属性的近似属性校准。据我们所知，这是首个为离散属性提供近似校准结果的研究。在此过程中，我们通过构造设计这些利普希茨属性的算法，刻画了强可排序离散属性的利普希茨引述复杂度，并证明这些属性可通过后处理获得原始离散属性。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

论学习、公平性与复杂度

专知会员服务

12+阅读 · 2月28日

【斯坦福博士论文】概率机器学习中的不确定性原理

专知会员服务

27+阅读 · 2025年8月4日

【柏林工业大学博士论文】可解释结构化机器学习:对相似性、图和Transformer模型的洞察，143页pdf

专知会员服务

46+阅读 · 2023年2月28日

机器学习可解释如何客观评估？CMU-Yeh博士论文《可解释机器学习的客观标准》，148页pdf

专知会员服务

79+阅读 · 2022年11月23日