We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and meaningful in online predictive tasks. Since the online selection causes a temporal multiplicity in the selected prediction intervals, it is important to control the real-time false coverage-statement rate (FCR) to measure the averaged miscoverage error. We develop a general framework named CAS (Calibration after Adaptive Selection) that can wrap around any prediction model and online selection rule to output post-selection prediction intervals. If the current individual is selected, we first perform an adaptive selection on historical data to construct a calibration set, then output a conformal prediction interval for the unobserved label. We provide tractable constructions for the calibration set for popular online selection rules. We proved that CAS can achieve an exact selection-conditional coverage guarantee in the finite-sample and distribution-free regimes. For the decision-driven selection rule, including most online multiple-testing procedures, CAS can exactly control the real-time FCR below the target level without any distributional assumptions. For the online selection with symmetric thresholds, we establish the error bound for the control gap of FCR under mild distributional assumptions. To account for the distribution shift in online data, we also embed CAS into some recent dynamic conformal prediction methods and examine the long-run FCR control. Numerical results on both synthetic and real data corroborate that CAS can effectively control FCR around the target level and yield more narrowed prediction intervals over existing baselines across various settings.
翻译:我们研究在线框架下的后选择预测推断问题。为避免将资源分配给不重要的单元,在报告预测区间前对当前个体进行初步选择,是在线预测任务中常见且有意义的方法。由于在线选择会导致所选预测区间出现时间多重性,控制实时错误覆盖率(FCR)以衡量平均覆盖误差至关重要。我们提出名为CAS(自适应选择后校准)的通用框架,该框架可封装任意预测模型与在线选择规则,输出后选择预测区间。若当前个体被选中,我们首先对历史数据进行自适应选择以构建校准集,随后为未观测标签输出共形预测区间。针对常见在线选择规则,我们提供了校准集的可构建方案。证明CAS在有限样本与无分布假设情形下能达到精确的选择条件覆盖保证。对于包含大多数在线多重检验程序的决策驱动选择规则,CAS可在无分布假设下将实时FCR精确控制在目标水平以下。针对对称阈值在线选择,我们在温和分布假设下建立了FCR控制间隙的误差界。为应对在线数据分布偏移,我们还将CAS嵌入近期动态共形预测方法,并检验其长期FCR控制效果。合成数据与真实数据的数值结果均表明,CAS能有效将FCR控制在目标水平附近,并在多种场景下相较于现有基线方法生成更窄的预测区间。