In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictor. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction sets small enough, excluding null values, or obeying other appropriate `monotone' constraints. We develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.
翻译:在监督学习(包括回归和分类)中,置信方法可为任何机器学习预测器提供具有有限样本覆盖率的输出/标签预测集。本文考虑此类预测集经过选择过程后的情况。该选择过程要求所选预测集在明确定义的意义上是“信息丰富”的。我们同时考虑分类和回归场景,其中分析者可能仅将预测集足够小、排除空值或满足其他适当“单调”约束的样本视为信息丰富的。我们提出了一个统一框架,用于构建此类信息丰富的置信预测集,同时控制所选样本上的错误覆盖率(FCR)。尽管选择后的置信预测集已成为该领域近期大量文献的焦点,但据我们所知,新引入的InfoSP和InfoSCOP程序是首个能为信息丰富预测集提供FCR控制的方法。我们通过真实数据和模拟数据展示了所提出程序的有效性。