Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further splitting the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after efficiency-oriented model selection. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without invoking additional sample-splitting. We show that our methods yield prediction sets with asymptotically optimal size under certain notion of continuity for the model class. The improved efficiency of the prediction sets constructed by our methods are further demonstrated through applications to synthetic datasets in various settings and a real data example.
翻译:给定一组预训练模型和一个保留集,我们如何在选择最小化预测集宽度的模型的同时,构建有效的共形预测集?如果使用相同的保留数据集既选择模型(即产生最小共形预测集的模型),又基于所选模型构建共形预测集,则会因选择偏差而导致覆盖率的损失。另一种做法是进一步分割数据以分别进行模型选择和校准,但这在数据集规模有限时会带来显著代价。本文致力于解决在效率导向模型选择后构建有效预测集的挑战。我们提出的新方法能够高效实现,并在无需额外数据分割的情况下提供有限样本有效性保证。我们证明,在模型类满足某种连续性概念的条件下,我们的方法所构建的预测集具有渐近最优的尺寸。通过在不同设置下的合成数据集以及一个真实数据案例中的应用,进一步验证了本文方法所构建预测集的效率提升。