For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a set can be observed, but standard statistical estimation methods cannot be applied to estimate causal effects because of multicollinearity/high-dimensional data problems. We propose a novel two-stage penalized regression approach, the penalized covariate-mediator selection operator (PCM Selector), to estimate the causal effects in such scenarios. Unlike existing penalized regression analyses, when a set of intermediate variables is available, PCM Selector provides a consistent or less biased estimator of the causal effect. In addition, PCM Selector provides a variable selection procedure for intermediate variables to obtain better estimation accuracy of the causal effects than does the back-door criterion.
翻译:针对可用线性结构方程模型描述的随机变量数据生成过程,我们考虑以下两种情况:(i)满足后门准则的协变量集合无法观测;(ii)此类集合虽可观测,但由于多重共线性/高维数据问题,标准统计估计方法无法用于估计因果效应。我们提出一种新颖的两阶段惩罚回归方法——惩罚协变量-中介选择算子(PCM选择器),用于在此类场景中估计因果效应。与现有惩罚回归分析不同,当存在中介变量集合时,PCM选择器能提供因果效应的一致估计量或偏差更小的估计量。此外,PCM选择器通过中介变量的变量选择程序,可获得比后门准则更优的因果效应估计精度。