Given data on a random variable \(Y\), a prediction set with miscoverage level \(α\in (0,1)\) is a set that contains a new draw of \(Y\) with probability \(1-α\). Among all prediction sets satisfying this coverage property, the oracle prediction set is the one with minimal volume. The oracle prediction set offers a complementary view of the distribution of \(Y\), beyond point estimators such as the mean and quantiles, and has attracted considerable interest recently. This paper develops methods for estimating such prediction sets conditional on observed covariates when \(Y\) is \textit{censored} or \textit{interval-valued}. We characterise the oracle prediction set under partial identification induced by interval censoring and propose consistent estimators for both oracle prediction intervals and more general oracle prediction sets consisting of multiple disjoint intervals. In addition, we apply conformal inference to construct finite-sample valid prediction sets for interval outcomes that remain consistent as the sample size grows, using a conformity score tailored to interval data. The proposed procedure accounts for irreducible prediction uncertainty due to the stochastic nature of outcomes, modelling uncertainty arising from partial identification, and sampling uncertainty that vanishes as sample size increases. We conduct Monte Carlo simulations and two empirical applications using UK job postings data and the US Current Population Survey. The results demonstrate the robustness and efficiency of the proposed methods.
翻译:给定随机变量 \(Y\) 的数据,一个误覆盖水平为 \(α\in (0,1)\) 的预测集是指以概率 \(1-α\) 包含 \(Y\) 一次新抽取值的集合。在所有满足此覆盖性质的预测集中,最优预测集是具有最小体积的集合。最优预测集提供了对 \(Y\) 分布的一个补充视角,超越了均值和分位数等点估计量,近年来引起了广泛关注。本文针对 \(Y\) 被**删失**或为**区间值**的情况,开发了基于观测协变量估计此类条件预测集的方法。我们刻画了由区间删失引起的部分识别下的最优预测集,并针对最优预测区间以及由多个不相交区间构成的更一般最优预测集,提出了一致的估计量。此外,我们应用保形推断,利用一个为区间数据定制的符合度分数,为区间结果构建了有限样本有效的预测集,且该预测集在样本量增长时保持一致性。所提出的方法考虑了源于结果随机性的不可约预测不确定性、由部分识别引起的建模不确定性,以及随样本量增加而消失的抽样不确定性。我们使用英国职位发布数据和美国当前人口调查进行了蒙特卡洛模拟和两项实证应用。结果证明了所提出方法的稳健性和高效性。