We study causal effect estimation from a mixture of observational and interventional data in a confounded linear regression model with multivariate treatments. We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators arising from both the observational and interventional setting. To this end, we derive methods based on matrix weighted linear estimators and prove that our methods are asymptotically unbiased in the infinite sample limit. This is an important improvement compared to the pooled estimator using the union of interventional and observational data, for which the bias only vanishes if the ratio of observational to interventional data tends to zero. Studies on synthetic data confirm our theoretical findings. In settings where confounding is substantial and the ratio of observational to interventional data is large, our estimators outperform a Stein-type estimator and various other baselines.
翻译:我们研究在包含多变量处理的混淆线性回归模型中,从观测数据和干预数据混合中进行因果效应估计。我们证明,通过结合来自观测和干预场景的估计量,能以预期平方误差衡量的统计效率得到提升。为此,我们推导出基于矩阵加权线性估计量的方法,并证明所提方法在无限样本极限下具有渐近无偏性。相较于使用干预-观测数据联合的混合估计量(其偏差仅在观测与干预数据比例趋近于零时消失),这是一个重要改进。合成数据实验验证了我们的理论结论。在混淆效应显著且观测数据与干预数据比例较大时,我们的估计量优于斯坦因型估计量及其他多种基线方法。