This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data. We build on recent work by Cand\`es et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users' active times on a mobile app.
翻译:本文提出了一种假设精简方法,用于在存在删失数据的情况下构建有效且高效的低位预测界(LPB)。我们基于Candès等人(2021)的近期工作展开,该方法首先对数据进行子集划分以剔除早期删失时间的数据点,随后采用加权技术(即加权共形推断,Tibshirani等人,2019)来修正由子集划分过程引起的分布偏移。对于我们的新方法,在数据子集划分阶段不再局限于固定的删失时间阈值,而是允许协变量依赖且数据自适应的子集划分步骤,这能更好地捕捉删失机制的异质性。因此,我们的方法可得到保守性更低且信息更准确的LPB。我们证明,在I型右删失设定下,若删失机制或生存时间条件分位数中任一得到良好估计,所提过程可实现近乎精确的边际覆盖,而在后者情形下还额外具有近似条件覆盖。通过数值实验评估了所提算法的有效性与效率,并展示了其相较于其他竞争方法的优势。最后,我们将该方法应用于真实数据集,为移动应用用户活跃时间生成LPB。