In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictors. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction label sets or prediction intervals small enough, excluding null values, or obeying other appropriate `monotone' constraints. While this covers many settings of possible interest in various applications, we develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.
翻译:在监督学习中,包括回归和分类任务,共形方法能为任何机器学习预测器提供具有有限样本覆盖率的预测结果/标签集。本文考虑此类预测集在筛选过程之后的使用场景。该筛选过程要求被选中的预测集在严格定义下具有“信息量”。我们同时考虑分类和回归场景:分析者可能仅将那些预测标签集或预测区间足够小、排除空值、或满足其他适当“单调”约束的样本视为具有信息量。虽然这涵盖了多种应用中可能感兴趣的设定,我们开发了一个统一框架,用于构建此类具有信息量的共形预测集,同时控制选中样本的错误覆盖率(FCR)。尽管筛选后的共形预测集是近年来该领域文献关注的焦点,但本文提出的新流程——InfoSP和InfoSCOP——据我们所知是首个为具有信息量的预测集提供FCR控制的方法。我们通过真实数据和模拟数据展示了这些流程的有效性。