Selective prediction, where a model has the option to abstain from making a decision, is crucial for machine learning applications in which mistakes are costly. In this work, we focus on distributional regression and introduce a framework that enables the model to abstain from estimation in situations of high uncertainty. We refer to this approach as distributional regression with reject option, inspired by similar concepts in classification and regression with reject option. We study the scenario where the rejection rate is fixed. We derive a closed-form expression for the optimal rule, which relies on thresholding the entropy function of the Continuous Ranked Probability Score (CRPS). We propose a semi-supervised estimation procedure for the optimal rule, using two datasets: the first, labeled, is used to estimate both the conditional distribution function and the entropy function of the CRPS, while the second, unlabeled, is employed to calibrate the desired rejection rate. Notably, the control of the rejection rate is distribution-free. Under mild conditions, we show that our procedure is asymptotically as effective as the optimal rule, both in terms of error rate and rejection rate. Additionally, we establish rates of convergence for our approach based on distributional k-nearest neighbor. A numerical analysis on real-world datasets demonstrates the strong performance of our procedure
翻译:选择性预测,即模型可选择放弃做出决策,对于错误代价高昂的机器学习应用至关重要。在本工作中,我们聚焦于分布回归,并引入一个框架,使模型能够在高不确定性情况下放弃估计。我们称这种方法为具有拒绝选项的分布回归,其灵感来源于分类和回归中具有拒绝选项的类似概念。我们研究拒绝率固定的场景。我们推导了最优规则的闭式表达式,该表达式依赖于对连续排序概率评分熵函数的阈值化处理。我们提出了一种用于估计最优规则的半监督估计程序,该程序使用两个数据集:第一个为有标签数据集,用于估计条件分布函数和连续排序概率评分的熵函数;第二个为无标签数据集,用于校准期望的拒绝率。值得注意的是,对拒绝率的控制是分布无关的。在温和条件下,我们证明了我们的程序在错误率和拒绝率方面均渐近等效于最优规则。此外,我们基于分布k近邻方法建立了我们方法的收敛速率。在真实世界数据集上的数值分析展示了我们程序的优异性能。