Synthetic control (SC) models are widely used to estimate causal effects in settings with observational time-series data. To identify the causal effect on a target unit, SC requires the existence of correlated units that are not impacted by the intervention. Given one of these potential donor units, how can we decide whether it is in fact a valid donor - that is, one not subject to spillover effects from the intervention? Such a decision typically requires appealing to strong a priori domain knowledge specifying the units, which becomes infeasible in situations with large pools of potential donors. In this paper, we introduce a practical, theoretically-grounded donor selection procedure, aiming to weaken this domain knowledge requirement. Our main result is a Theorem that yields the assumptions required to identify donor values at post-intervention time points using only pre-intervention data. We show how this Theorem - and the assumptions underpinning it - can be turned into a practical method for detecting potential spillover effects and excluding invalid donors when constructing SCs. Importantly, we employ sensitivity analysis to formally bound the bias in our SC causal estimate in situations where an excluded donor was indeed valid, or where a selected donor was invalid. Using ideas from the proximal causal inference and instrumental variables literature, we show that the excluded donors can nevertheless be leveraged to further debias causal effect estimates. Finally, we illustrate our donor selection procedure on both simulated and real-world datasets.
翻译:合成控制(SC)模型被广泛用于在具有观测时间序列数据的场景中估计因果效应。为识别对目标单元的因果效应,SC要求存在未受干预影响的相关单元。给定这些潜在供体单元之一,我们如何判断其是否为有效供体——即未受到干预溢出效应影响的单元?此类判断通常需要依赖指定单元的强先验领域知识,这在存在大量潜在供体池的情况下变得不可行。本文提出一种实用且理论依据充分的供体选择流程,旨在弱化对领域知识的需求。我们的主要成果是一个定理,该定理给出了仅使用干预前数据识别干预后时间点供体值所需的假设条件。我们展示了如何将该定理及其支撑假设转化为实际方法,用于在构建SC时检测潜在溢出效应并排除无效供体。重要的是,我们采用敏感性分析来正式界定以下两种情形下SC因果估计的偏差:当被排除的供体实际有效时,或当所选供体实际无效时。借鉴近端因果推断和工具变量文献的思想,我们证明仍可利用被排除的供体进一步消除因果效应估计的偏差。最后,我们在模拟数据集和真实数据集上展示了供体选择流程的实际应用。