In many empirical settings, directly observing a treatment variable may be infeasible although an error-prone surrogate measurement of the latter will often be available. Causal inference based solely on the observed surrogate measurement of the hidden treatment may be particularly challenging without an additional assumption or auxiliary data. To address this issue, we propose a method that carefully incorporates the surrogate measurement together with a proxy of the hidden treatment to identify its causal effect on any scale for which identification would in principle be feasible had contrary to fact the treatment been observed error-free. Beyond identification, we provide general semiparametric theory for causal effects identified using our approach, and we derive a large class of semiparametric estimators with an appealing multiple robustness property. A significant obstacle to our approach is the estimation of nuisance functions involving the hidden treatment, which prevents the direct application of standard machine learning algorithms. To resolve this, we introduce a novel semiparametric EM algorithm, thus adding a practical dimension to our theoretical contributions. This methodology can be adapted to analyze a large class of causal parameters in the proposed hidden treatment model, including the population average treatment effect, the effect of treatment on the treated, quantile treatment effects, and causal effects under marginal structural models. We examine the finite-sample performance of our method using simulations and an application which aims to estimate the causal effect of Alzheimer's disease on hippocampal volume using data from the Alzheimer's Disease Neuroimaging Initiative.
翻译:在众多实证场景中,直接观测处理变量可能不可行,但通常可获取其存在测量误差的替代测量值。若缺乏额外假设或辅助数据,仅基于观测到的隐藏处理替代测量值进行因果推断将尤为困难。针对这一问题,我们提出一种方法,通过审慎整合替代测量值与隐藏处理的代理变量,识别其对任何尺度(理论上若处理变量被无误差观测时本可实现识别)的因果效应。除识别外,我们为使用该方法识别的因果效应建立了通用半参数理论,并推导出一大类具有多重稳健性特征(multiple robustness property)的半参数估计量。该方法面临的主要障碍在于需估计涉及隐藏处理的干扰函数,这阻碍了标准机器学习算法的直接应用。为此,我们引入一种新颖的半参数期望最大化算法(EM algorithm),从而为理论贡献赋予实践维度。该方法可扩展至分析所提出的隐藏处理模型中包括总体平均处理效应、处理组平均处理效应、分位数处理效应及边际结构模型下因果效应在内的大类因果参数。我们通过模拟实验及基于阿尔茨海默病神经影像学倡议(Alzheimer's Disease Neuroimaging Initiative)数据评估阿尔茨海默病对海马体积因果效应的应用案例,检验了该方法在有限样本下的性能。