In many empirical settings, directly observing a treatment variable may be infeasible although an error-prone surrogate measurement of the latter will often be available. Causal inference based solely on the observed surrogate measurement of the hidden treatment may be particularly challenging without an additional assumption or auxiliary data. To address this issue, we propose a method that carefully incorporates the surrogate measurement together with a proxy of the hidden treatment to identify its causal effect on any scale for which identification would in principle be feasible had contrary to fact the treatment been observed error-free. Beyond identification, we provide general semiparametric theory for causal effects identified using our approach, and we derive a large class of semiparametric estimators with an appealing multiple robustness property. A significant obstacle to our approach is the estimation of nuisance functions involving the hidden treatment, which prevents the direct application of standard machine learning algorithms. To resolve this, we introduce a novel semiparametric EM algorithm, thus adding a practical dimension to our theoretical contributions. This methodology can be adapted to analyze a large class of causal parameters in the proposed hidden treatment model, including the population average treatment effect, the effect of treatment on the treated, quantile treatment effects, and causal effects under marginal structural models. We examine the finite-sample performance of our method using simulations and an application which aims to estimate the causal effect of Alzheimer's disease on hippocampal volume using data from the Alzheimer's Disease Neuroimaging Initiative.
翻译:在许多实证研究中,直接观测处理变量可能不可行,但通常可以获得其存在测量误差的替代测量。若缺乏额外假设或辅助数据,仅基于观测到的潜在处理变量替代测量进行因果推断可能尤为困难。为解决这一问题,我们提出一种方法,通过谨慎整合替代测量与潜在处理变量的代理变量,以识别其在任意尺度上的因果效应——该尺度在理论上本应可通过无误差观测处理变量实现识别。除识别理论外,我们为基于本方法识别的因果效应建立了广义半参数理论,并推导出具有多重稳健特性的半参数估计量类。本方法面临的主要障碍在于涉及潜在处理变量的冗余函数估计,这阻碍了标准机器学习算法的直接应用。为此,我们提出一种新型半参数EM算法,从而为理论贡献增添了实践维度。该方法可适用于分析所提潜在处理变量模型中多种因果参数,包括总体平均处理效应、处理组处理效应、分位数处理效应以及边际结构模型下的因果效应。我们通过模拟实验和一项应用研究(利用阿尔茨海默病神经影像学计划数据估计阿尔茨海默病对海马体体积的因果效应)检验了本方法的有限样本性能。