Various methods have recently been proposed to estimate causal effects with confidence intervals that are uniformly valid over a set of data generating processes when high-dimensional nuisance models are estimated by post-model-selection or machine learning estimators. These methods typically require that all the confounders are observed to ensure identification of the effects. We contribute by showing how valid semiparametric inference can be obtained in the presence of unobserved confounders and high-dimensional nuisance models. We propose uncertainty intervals which allow for unobserved confounding, and show that the resulting inference is valid when the amount of unobserved confounding is small relative to the sample size; the latter is formalized in terms of convergence rates. Simulation experiments illustrate the finite sample properties of the proposed intervals and investigate an alternative procedure that improves the empirical coverage of the intervals when the amount of unobserved confounding is large. Finally, a case study on the effect of smoking during pregnancy on birth weight is used to illustrate the use of the methods introduced to perform a sensitivity analysis to unobserved confounding.
翻译:近期提出的多种方法能够在高维干扰模型通过模型选择后或机器学习估计量估计时,针对一组数据生成过程提供一致有效的置信区间来估计因果效应。这些方法通常需要观测所有混杂变量以确保效应的可识别性。我们的贡献在于证明:在存在未观测混杂变量与高维干扰模型的情况下,如何获得有效的半参数推断。我们提出了允许未观测混杂存在的不确定性区间,并证明当未观测混杂程度相对于样本量较小时(后者通过收敛速率形式化定义),所得推断具有有效性。模拟实验展示了所提区间的有限样本性质,并探究了一种可在未观测混杂程度较大时改善区间经验覆盖率的替代方法。最后,通过孕期吸烟对出生体重影响的案例研究,展示了如何运用所提方法进行未观测混杂的敏感性分析。