Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded in prospective trial design due to bias concerns. We argue this "tabula rasa" data acquisition strategy is fundamentally inefficient. In this work, we propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior. This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias. To operationalize this, we introduce the R-Design framework. Theoretically, we establish two key advantages: (1) a structural efficiency gap, proving that estimating smooth residual contrasts admits strictly faster convergence rates than reconstructing full outcomes; and (2) information efficiency, where we quantify the redundancy in standard parameter-based acquisition (e.g., BALD), demonstrating that such baselines waste budget on task-irrelevant nuisance uncertainty. We propose R-EPIG (Residual Expected Predictive Information Gain), a unified criterion that directly targets the causal estimand, minimizing residual uncertainty for estimation or clarifying decision boundaries for policy. Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines, confirming that repairing a biased model is far more efficient than learning one from scratch.
翻译:随机对照试验(RCT)被视为因果推断的黄金标准,但其资源依然稀缺。尽管大规模观测数据通常易于获取,但以往仅被用于回顾性融合,并因偏差担忧而在前瞻性试验设计中被弃用。我们认为这种“白板式”数据获取策略本质上是低效的。本文提出一种新范式——主动残差学习,该范式将观测模型作为基础先验。该方法将实验重心从零开始学习目标因果量,转向高效估计校正观测偏差所需的残差。为实现这一目标,我们提出了R-Design框架。理论上,我们确立了两个关键优势:(1)结构效率差距:证明估计平滑残差对比量可获得比重构完整结果严格更快的收敛速率;(2)信息效率:我们量化了基于标准参数获取方法(如BALD)中的冗余,证明此类基线方法将预算浪费在与任务无关的冗余不确定性上。我们提出了R-EPIG(残差期望预测信息增益),这是一个直接以因果估计量为目标的统一准则,可最小化估计的残差不确定性或明晰决策边界以优化策略。在合成与半合成基准测试上的实验表明,R-Design显著优于基线方法,证实修正一个有偏模型远比从零开始学习一个模型更为高效。