The selective classifier (SC) has garnered increasing interest in areas such as medical diagnostics, autonomous driving, and the justice system. The Area Under the Risk-Coverage Curve (AURC) has emerged as the foremost evaluation metric for assessing the performance of SC systems. In this work, we introduce a more straightforward representation of the population AURC, interpretable as a weighted risk function, and propose a Monte Carlo plug-in estimator applicable to finite sample scenarios. We demonstrate that our estimator is consistent and offers a low-bias estimation of the actual weights, with a tightly bounded mean squared error (MSE). We empirically show the effectiveness of this estimator on a comprehensive benchmark across multiple datasets, model architectures, and Confidence Score Functions (CSFs).
翻译:选择性分类器(SC)在医疗诊断、自动驾驶和司法系统等领域引起了日益广泛的关注。风险-覆盖曲线下面积(AURC)已成为评估SC系统性能的首要评价指标。本研究提出了一种更简洁的总体AURC表征形式,其可解释为加权风险函数,并针对有限样本场景提出了一种蒙特卡洛插件估计量。我们证明了该估计量具有一致性,并能对实际权重提供低偏差估计,其均方误差(MSE)具有严格上界。我们通过跨多个数据集、模型架构及置信度评分函数(CSF)的综合基准测试,实证验证了该估计量的有效性。