Learning the Individual Treatment Effect (ITE) is essential for personalized decision making, yet causal inference has traditionally focused on aggregated treatment effects. While integrating conformal prediction with causal inference can provide valid uncertainty quantification for ITEs, the resulting prediction intervals are often excessively wide, limiting their practical utility. To address this limitation, we introduce \underline{S}urrogate-assisted \underline{C}onformal \underline{I}nference for \underline{E}fficient I\underline{N}dividual \underline{C}ausal \underline{E}ffects (SCIENCE), a framework designed to construct more efficient prediction intervals for ITEs. SCIENCE applies to various data configurations, including semi-supervised and surrogate-assisted semi-supervised learning. It accommodates covariate shifts between source data, which contain primary outcomes, and target data, which may include only surrogate outcomes or covariates. Leveraging semi-parametric efficiency theory, SCIENCE produces rate double-robust prediction intervals under mild rate convergence conditions, permitting the use of flexible non-parametric models to estimate nuisance functions. We quantify efficiency gains by comparing semi-parametric efficiency bounds with and without the incorporation of surrogates. Simulation studies demonstrate that our surrogate-assisted intervals offer substantial efficiency improvements over existing methods while maintaining valid group-conditional coverage. Applied to the phase 3 Moderna COVE COVID-19 vaccine trial, SCIENCE illustrates how multiple surrogate markers can be leveraged to generate more efficient prediction intervals.
翻译:个体处理效应的学习对于个性化决策至关重要,然而因果推断传统上侧重于聚合处理效应。虽然将保形预测与因果推断相结合可以为个体处理效应提供有效的不确定性量化,但由此产生的预测区间往往过宽,限制了其实际效用。为应对这一局限,我们提出了用于高效个体因果效应的代理辅助保形推断框架,该框架旨在为个体处理效应构建更高效的预测区间。该框架适用于多种数据配置,包括半监督学习和代理辅助半监督学习。它能够适应源数据(包含主要结局)与目标数据(可能仅包含代理结局或协变量)之间的协变量偏移。借助半参数效率理论,该框架在温和的速率收敛条件下产生速率双重稳健的预测区间,允许使用灵活的非参数模型来估计干扰函数。我们通过比较纳入与未纳入代理变量时的半参数效率界限来量化效率增益。模拟研究表明,我们的代理辅助区间在保持有效组条件覆盖的同时,相比现有方法提供了显著的效率提升。应用于第三阶段Moderna COVE COVID-19疫苗试验时,该框架展示了如何利用多个代理标记来生成更高效的预测区间。