Conformal inference has played a pivotal role in providing uncertainty quantification for black-box ML prediction algorithms with finite sample guarantees. Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. In practical applications, one might want to update the miscoverage level after computing the prediction set. For example, in the context of binary classification, the analyst might start with a $95\%$ prediction sets and see that most prediction sets contain all outcome classes. Prediction sets with both classes being undesirable, the analyst might desire to consider, say $80\%$ prediction set. Construction of prediction sets that guarantee coverage with data-dependent miscoverage level can be considered as a post-selection inference problem. In this work, we develop uniform conformal inference with finite sample prediction guarantee with arbitrary data-dependent miscoverage levels using distribution-free confidence bands for distribution functions. This allows practitioners to trade freely coverage probability for the quality of the prediction set by any criterion of their choice (say size of prediction set) while maintaining the finite sample guarantees similar to traditional conformal inference.
翻译:共形推断在提供黑箱机器学习预测方法的不确定性量化方面发挥了关键作用,且具备有限样本保证。传统上,共形预测推断要求数据无关地预先指定错误覆盖水平。而在实际应用中,分析者可能希望在计算出预测集后更新错误覆盖水平。例如,在二分类场景中,分析者最初可能采用95%的预测集,但发现大多数预测集包含所有结果类别。由于两类结果均不可取的预测集并不理想,分析者可能会考虑改用80%的预测集。如何构建具有数据依赖错误覆盖水平且保证覆盖率的预测集,可视为一个事后选择推断问题。本研究提出一种统一共形推断方法,通过使用无分布假设的分布函数置信带,在任意数据依赖的错误覆盖水平下实现有限样本预测保证。这使得实践者能够自由权衡覆盖概率与预测集质量(如以预测集大小为准则),同时保持与传统共形推断类似的有限样本保证。