The Difference-in-Differences (DiD) method is a fundamental tool for causal inference, yet its application is often complicated by missing data. Although recent work has developed robust DiD estimators for complex settings like staggered treatment adoption, these methods typically assume complete data and fail to address the critical challenge of outcomes that are missing at random (MAR) -- a common problem that invalidates standard estimators. We develop a rigorous framework, rooted in semiparametric theory, for identifying and efficiently estimating the Average Treatment Effect on the Treated (ATT) when either pre- or post-treatment (or both) outcomes are missing at random. We first establish nonparametric identification of the ATT under two minimal sets of sufficient conditions. For each, we derive the semiparametric efficiency bound, which provides a formal benchmark for asymptotic optimality. We then propose novel estimators that are asymptotically efficient, achieving this theoretical bound. A key feature of our estimators is their multiple robustness, which ensures consistency even if some nuisance function models are misspecified. We validate the properties of our estimators and showcase their broad applicability through an extensive simulation study.
翻译:双重差分(DiD)方法是因果推断的基本工具,但其应用常因数据缺失而变得复杂。尽管近期研究已针对交错处理采用等复杂场景开发了稳健的DiD估计量,这些方法通常假设数据完整,未能解决结果随机缺失(MAR)这一关键挑战——该常见问题会导致标准估计量失效。基于半参数理论,我们建立了一个严谨框架,用于在治疗前或治疗后(或两者)结果随机缺失时识别并高效估计处理组平均处理效应(ATT)。我们首先在两组最小充分条件下建立了ATT的非参数识别性。针对每组条件,我们推导出半参数效率边界,为渐近最优性提供了形式化基准。随后,我们提出了达到该理论边界的渐近高效新型估计量。我们估计量的关键特性在于其多重稳健性,即使部分干扰函数模型设定错误,仍能保证估计的一致性。通过广泛的模拟研究,我们验证了估计量的性质并展示了其广泛适用性。