This work addresses the problem of constructing reliable prediction intervals for individual counterfactual outcomes. Existing conformal counterfactual inference (CCI) methods provide marginal coverage guarantees but often produce overly conservative intervals, particularly under treatment imbalance when counterfactual samples are scarce. We introduce synthetic data-powered CCI (SP-CCI), a new framework that augments the calibration set with synthetic counterfactual labels generated by a pre-trained counterfactual model. To ensure validity, SP-CCI incorporates synthetic samples into a conformal calibration procedure based on risk-controlling prediction sets (RCPS) with a debiasing step informed by prediction-powered inference (PPI). We prove that SP-CCI achieves tighter prediction intervals while preserving marginal coverage, with theoretical guarantees under both exact and approximate importance weighting. Empirical results on different datasets confirm that SP-CCI consistently reduces interval width compared to standard CCI across all settings.
翻译:本研究旨在解决为个体反事实结果构建可靠预测区间的问题。现有保形反事实推断(CCI)方法虽能提供边际覆盖保证,但常产生过于保守的区间,尤其在处理不平衡且反事实样本稀缺的情况下。我们提出合成数据驱动的保形反事实推断(SP-CCI),该框架通过预训练反事实模型生成的合成反事实标签来扩增校准集。为确保有效性,SP-CCI将合成样本整合至基于风险控制预测集(RCPS)的保形校准流程中,并引入基于预测驱动推断(PPI)的去偏步骤。我们证明SP-CCI在保持边际覆盖的同时能获得更紧凑的预测区间,并在精确与近似重要性加权下均具有理论保证。在不同数据集上的实证结果表明,相较于标准CCI方法,SP-CCI在所有设定下均能持续缩减区间宽度。