We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple short-term outcomes, and develop three novel identification strategies for the average long-term treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders.
翻译:我们研究在同时拥有实验数据和观测数据时长期处理效应的识别与估计问题。由于长期结局需经过长时间延迟才能观测,因此实验数据中未包含该变量,仅记录于观测数据中。然而两类数据均包含部分短期结局变量的观测值。本文创新性地解决了持久未测量混杂因素的挑战——即某些未测量混杂因素可同时影响处理变量、短期结局与长期结局,该类因素会导致现有文献中的识别策略失效。为应对这一挑战,我们利用多个短期结局的时序结构,提出了三种针对平均长期处理效应的新型识别策略。进一步地,我们构建了相应的三个估计量,并证明其渐近一致性与渐近正态性。最后,我们采用半合成数据将所提方法应用于评估职业培训项目对长期就业率的影响。数值结果表明,相较于无法处理持久混杂因素的现有方法,我们的方案具有显著优势。