Conformal inference has played a pivotal role in providing uncertainty quantification for black-box ML prediction algorithms with finite sample guarantees. Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. In practical applications, one might want to update the miscoverage level after computing the prediction set. For example, in the context of binary classification, the analyst might start with a 95$\%$ prediction sets and see that most prediction sets contain all outcome classes. Prediction sets with both classes being undesirable, the analyst might desire to consider, say 80$\%$ prediction set. Construction of prediction sets that guarantee coverage with data-dependent miscoverage level can be considered as a post-selection inference problem. In this work, we develop simultaneous conformal inference to account for data-dependent miscoverage levels. Under the assumption of independent and identically distributed observations, our proposed methods have a finite sample simultaneous guarantee over all miscoverage levels. This allows practitioners to trade freely coverage probability for the quality of the prediction set by any criterion of their choice (say size of prediction set) while maintaining the finite sample guarantees similar to traditional conformal inference.
翻译:共形推断在提供黑箱机器学习预测方法的不确定性量化方面发挥了关键作用,并具有有限样本保证。传统上,共形预测推断需要独立于数据指定错误覆盖率水平。在实际应用中,我们可能需要在计算预测集后更新错误覆盖率水平。例如,在二分类场景中,分析者可能最初设置95%的预测集,但发现大多数预测集包含所有结果类别。由于包含两类结果的预测集并不理想,分析者可能希望考虑80%的预测集。构建具有数据依赖错误覆盖率且能保证覆盖率的预测集可视为后验选择推断问题。本研究开发了同时共形推断方法以处理数据依赖的错误覆盖率水平。在独立同分布观测假设下,所提方法对所有错误覆盖率水平提供有限样本的同时保证。这使得实践者能够在保持与传统共形推断相似的有限样本保证的前提下,根据任意选择标准(如预测集大小)自由权衡覆盖率概率与预测集质量。