In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictor. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction sets small enough, excluding null values, or obeying other appropriate `monotone' constraints. We develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.
翻译:在监督学习中,包括回归和分类任务,共形方法可为任何机器学习预测器提供具有有限样本覆盖率的结局/标签预测集。本文考虑此类预测集需经过选择过程的情形,该选择过程要求所选预测集在明确定义的意义上具备"信息量"。我们同时针对分类和回归场景展开研究,其中分析人员可能仅将预测集足够小、排除空值或满足其他适当"单调性"约束的样本视为具有信息量。我们开发了一个统一框架,用于构建此类信息量优化的共形预测集,同时控制所选样本上的错误覆盖率(FCR)。尽管选择后的共形预测集已成为该领域近期文献的关注重点,但本文提出的新方法——InfoSP和InfoSCOP——据我们所知是首个为信息量优化预测集提供FCR控制的方法。我们在真实数据和模拟数据上展示了新方法的应用效果。