Distributional Counterfactual Analysis in High-Dimensional Setup

In the context of treatment effect estimation, this paper proposes a new methodology to recover the counterfactual distribution when there is a single (or a few) treated unit and possibly a high-dimensional number of potential controls observed in a panel structure. The methodology accommodates, albeit does not require, the number of units to be larger than the number of time periods (high-dimensional setup). As opposed to modeling only the conditional mean, we propose to model the entire conditional quantile function (CQF) without intervention and estimate it using the pre-intervention period by a l1-penalized regression. We derive non-asymptotic bounds for the estimated CQF valid uniformly over the quantiles. The bounds are explicit in terms of the number of time periods, the number of control units, the weak dependence coefficient (beta-mixing), and the tail decay of the random variables. The results allow practitioners to re-construct the entire counterfactual distribution. Moreover, we bound the probability coverage of this estimated CQF, which can be used to construct valid confidence intervals for the (possibly random) treatment effect for every post-intervention period. We also propose a new hypothesis test for the sharp null of no-effect based on the Lp norm of deviation of the estimated CQF to the population one. Interestingly, the null distribution is quasi-pivotal in the sense that it only depends on the estimated CQF, Lp norm, and the number of post-intervention periods, but not on the size of the post-intervention period. For that reason, critical values can then be easily simulated. We illustrate the methodology by revisiting the empirical study in Acemoglu, Johnson, Kermani, Kwak and Mitton (2016).

翻译：在因果效应估计背景下，本文提出一种新方法来恢复存在单个（或少量）受处理单元及面板结构中可能的高维潜在控制单元时的反事实分布。该方法允许（但非必需）单元数大于时间期数（即高维设定）。与仅建模条件均值不同，本文提出建模未受干预时的完整条件分位数函数（CQF），并通过预处理期数据使用l1惩罚回归进行估计。我们推导出估计CQF在分位数上一致有效的非渐近界，该界显式依赖于时间期数、控制单元数、弱依赖系数（β混合）及随机变量的尾部衰减程度。研究结果使实践者能够重建完整反事实分布。此外，我们界定了该估计CQF的概率覆盖范围，可用于为每个干预后期的（可能随机的）处理效应构建有效置信区间。我们还基于估计CQF与总体CQF偏差的Lp范数，提出针对零效应的点零假设检验。有趣的是，该零分布具有准枢轴性质——仅依赖于估计CQF、Lp范数及干预后期数，而与干预后期长度无关。因此，临界值可通过简单模拟获得。最后，通过复现Acemoglu、Johnson、Kermani、Kwak与Mitton（2016）的实证研究对方法进行验证。