In many empirical settings, directly observing a treatment variable may be infeasible although an error-prone surrogate measurement of the latter will often be available. Causal inference based solely on the surrogate measurement is particularly challenging without validation data. We propose a method that obviates the need for validation data by carefully incorporating the surrogate measurement with a proxy of the hidden treatment to obtain nonparametric identification of several causal effects of interest, including the population average treatment effect, the effect of treatment on the treated, quantile treatment effects, and causal effects under marginal structural models. For inference, we provide general semiparametric theory for causal effects identified using our approach and derive a large class of semiparametric efficient estimators with an appealing multiple robustness property. A significant obstacle to our approach is the estimation of nuisance functions which involve the hidden treatment therefore preventing the direct use of standard machine learning algorithms, which we resolve by introducing a novel semiparametric EM algorithm. We examine the finite-sample performance of our method using simulations and an application which aims to estimate the causal effect of Alzheimer's disease on hippocampal volume using data from the Alzheimer's Disease Neuroimaging Initiative.
翻译:在许多实证研究中,直接观测处理变量可能并不可行,但往往可以获得其存在测量误差的替代测量。在没有验证数据的情况下,仅基于替代测量进行因果推断尤其具有挑战性。我们提出了一种方法,通过将替代测量与隐藏处理的代理变量仔细结合,无需验证数据即可实现多个关注因果效应的非参数识别,包括总体平均处理效应、处理组处理效应、分位数处理效应以及边际结构模型下的因果效应。在推断方面,我们为使用本方法识别的因果效应提供了通用半参数理论,并推导出一类具有多重稳健性的半参数有效估计量。本方法面临的一个主要障碍是涉及隐藏处理的干扰函数估计问题,这阻碍了标准机器学习算法的直接使用;我们通过引入一种新颖的半参数EM算法解决了该问题。我们通过模拟实验和一个应用研究评估了本方法的有限样本性能,该应用旨在利用阿尔茨海默病神经影像学计划的数据,估计阿尔茨海默病对海马体体积的因果效应。