Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by learning embeddings that map each source's features into a common representation space. OS outcome models are transferred to the RCT embedding space and calibrated using trial data, preserving causal identification from randomization. Finite-sample risk bounds decompose into alignment error, outcome-model complexity, and calibration complexity terms, identifying when embedding alignment outperforms imputation. Under the calibration-based linear variant, the framework provides protection against negative transfer; the neural variant can be vulnerable under severe distributional shift. Under sparse linear models, the embedding approach strictly generalizes imputation. Simulations across 51 settings confirm that (i) calibration-based methods are equivalent for linear CATEs, and (ii) the neural embedding variant wins all 22 nonlinear-regime settings with large margins.
翻译:随机对照试验(RCT)是估计异质性治疗效果的金标准,但其在检测效果异质性时往往统计效力不足。大型观察性研究(OS)可补充RCT用于条件平均治疗效果(CATE)估计,但一个关键障碍是协变量不匹配:两个数据源测量的协变量集合不同且仅部分重叠。我们提出CALM(协变量不匹配下的校准对齐算法),该方法通过学习将各数据源特征映射到公共表示空间的嵌入向量,从而避免数据插补。OS结果模型被迁移至RCT嵌入空间,并利用试验数据进行校准,从而保留随机化带来的因果可识别性。有限样本风险界可分解为对齐误差、结果模型复杂度和校准复杂度三项,揭示了嵌入对齐方法优于插补法的条件。基于校准的线性变体为该框架提供了对抗负迁移的保护机制;而神经变体在严重分布偏移下可能表现脆弱。在稀疏线性模型下,嵌入方法严格泛化了插补法。跨越51个设置的模拟实验证实:(i)对于线性CATE,基于校准的方法效果等价;(ii)神经嵌入变体在全部22个非线性场景中以显著优势胜出。