The problem of causal inference with panel data is a central econometric question. The following is a fundamental version of this problem: Let $M^*$ be a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix $Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} := M^*_{ij} + E_{ij} + \mathcal{T}_{ij} Z_{ij}$ where $\mathcal{T}_{ij} $ are unknown, heterogenous treatment effects. The problem requires we estimate the average treatment effect $\tau^* := \sum_{ij} \mathcal{T}_{ij} Z_{ij} / \sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to estimating $\tau^*$ when $Z$ places support on a single row. This paper extends that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus broadly expanding its applicability. Our guarantees are the first of their type in this general setting. Computational experiments on synthetic and real-world data show a substantial advantage over competing estimators.
翻译:基于面板数据的因果推断问题是计量经济学中的核心问题。其基本形式可表述为:设 $M^*$ 为低秩矩阵,$E$ 为零均值噪声矩阵。对于元素取值于 $\{0,1\}$ 的"处理"矩阵 $Z$,我们观测到矩阵 $O$,其元素为 $O_{ij} := M^*_{ij} + E_{ij} + \mathcal{T}_{ij} Z_{ij}$,其中 $\mathcal{T}_{ij}$ 为未知的异质性处理效应。该问题要求估计平均处理效应 $\tau^* := \sum_{ij} \mathcal{T}_{ij} Z_{ij} / \sum_{ij} Z_{ij}$。当 $Z$ 支撑集位于单一行时,合成控制范式提供了估计 $\tau^*$ 的途径。本文将这一框架推广至一般 $Z$ 情形,实现了 $\tau^*$ 的速率最优恢复,从而显著拓展了其适用性。我们的理论保证是该一般设定下的首创成果。基于合成数据与真实数据的计算实验表明,本方法相较于竞争性估计量具有显著优势。