A fundamental problem in modern supervised learning is computing reliable conditional prediction intervals in high-dimensional settings: existing methods often rely on restrictive modelling assumptions, do not scale as predictor dimension increases, or only guarantee marginal (population-level) rather than conditional (individual-level) coverage. We introduce the $\textit{lifted predictive model}$ (LPM), a new conditional representation, and propose the MAPS (Model-Agnostic Prediction Sets) algorithm that produces distribution-free conditional prediction intervals and adapts to any trained predictive model. Our procedure is bootstrap-based, scales to high-dimensional inputs and accounts for heteroscedastic errors. We establish the theoretical properties of the LPM, connect prediction accuracy to interval length, and provide sufficient conditions for asymptotic conditional coverage. We evaluate the finite-sample performance of MAPS in a simulation study, and apply our method to simulation-based inference and image classification. In the former, MAPS provides the first approach for debiasing neural Bayes estimators and constructing valid confidence intervals for model parameters given the estimators, at any desired level. In the latter, it provides the first approach that accounts for uncertainty in model calibration and label prediction.
翻译:现代监督学习中的一个基本问题是在高维场景下计算可靠的条件预测区间:现有方法通常依赖于限制性建模假设,无法随预测器维度增加而扩展,或仅能保证边缘(总体层面)而非条件(个体层面)的覆盖概率。我们引入一种新的条件表示方法——$\textit{提升预测模型}$,并提出MAPS(模型无关预测集)算法,该算法能够生成分布无关的条件预测区间,并可适配任何已训练的预测模型。我们的流程基于自助法,可扩展至高维输入并处理异方差误差。我们建立了提升预测模型的理论性质,将预测精度与区间长度相关联,并为渐近条件覆盖提供了充分条件。我们通过模拟研究评估了MAPS在有限样本下的性能,并将该方法应用于基于模拟的推断和图像分类任务。在前者中,MAPS首次实现了对神经贝叶斯估计器的去偏,并在任意给定置信水平下为模型参数构建了基于估计器的有效置信区间。在后者中,该方法首次实现了对模型校准与标签预测不确定性的量化处理。