This paper addresses one of the most prevalent problems encountered by political scientists working with difference-in-differences (DID) design: missingness in panel data. A common practice for handling missing data, known as complete case analysis, is to drop cases with any missing values over time. A more principled approach involves using nonparametric bounds on causal effects or applying inverse probability weighting based on baseline covariates. Yet, these methods are general remedies that often under-utilize the assumptions already imposed on panel structure for causal identification. In this paper, I outline the pitfalls of complete case analysis and propose an alternative identification strategy based on principal strata. To be specific, I impose parallel trends assumption within each latent group that shares the same missingness pattern (e.g., always-respondents, if-treated-respondents) and leverage missingness rates over time to estimate the proportions of these groups. Building on this, I tailor Lee bounds, a well-known nonparametric bounds under selection bias, to partially identify the causal effect within the DID design. Unlike complete case analysis, the proposed method does not require independence between treatment selection and missingness patterns, nor does it assume homogeneous effects across these patterns.
翻译:本文探讨了政治学者在使用差分设计时最常遇到的一个问题:面板数据中的缺失值。处理缺失数据的常见做法是采用完整案例分析,即删除任何时间点上存在缺失值的个案。一种更具原则性的方法涉及对因果效应施加非参数边界,或基于基线协变量应用逆概率加权。然而,这些方法是通用解决方案,往往未能充分利用为因果识别而对面板结构已施加的假设。本文阐述了完整案例分析的缺陷,并提出了一种基于主层结构的替代识别策略。具体而言,我在每个具有相同缺失模式的潜在组内施加平行趋势假设,并利用随时间变化的缺失率来估计这些组的比例。在此基础上,我将Lee边界——一种在选择性偏差下著名的非参数边界——适配到差分设计中,以部分识别因果效应。与完整案例分析不同,所提出的方法既不要求处理选择与缺失模式之间的独立性,也不假设这些模式间的效应同质性。