Treatment effect estimation under unconfoundedness is a fundamental task in causal inference. In response to the challenge of analyzing high-dimensional datasets collected in substantive fields such as epidemiology, genetics, economics, and social sciences, many methods for treatment effect estimation with high-dimensional nuisance parameters (the outcome regression and the propensity score) have been developed in recent years. However, it is still unclear what is the necessary and sufficient sparsity condition on the nuisance parameters for the treatment effect to be $\sqrt{n}$-estimable. In this paper, we propose a new Double-Calibration strategy that corrects the estimation bias of the nuisance parameter estimates computed by regularized high-dimensional techniques and demonstrate that the corresponding Doubly-Calibrated estimator achieves $1 / \sqrt{n}$-rate as long as one of the nuisance parameters is sparse with sparsity below $\sqrt{n} / \log p$, where $p$ denotes the ambient dimension of the covariates, whereas the other nuisance parameter can be arbitrarily complex and completely misspecified. The Double-Calibration strategy can also be applied to settings other than treatment effect estimation, e.g. regression coefficient estimation in the presence of diverging number of controls in a semiparametric partially linear model.
翻译:在无混淆假设下处理效应估计是因果推断中的基础任务。为应对流行病学、遗传学、经济学与社会科学等实质性领域收集的高维数据集分析挑战,近年来已发展出许多具有高维干扰参数(结果回归与倾向得分)的处理效应估计方法。然而,当前仍不明确干扰参数需满足何种充分必要条件才可实现处理效应的$\sqrt{n}$-可估计性。本文提出一种新的双重校准策略,用于校正由正则化高维技术计算的干扰参数估计的偏差,并证明当其中一个干扰参数满足稀疏度低于$\sqrt{n} / \log p$(其中$p$表示协变量的环境维度)的条件时,对应的双重校准估计量可达到$1/\sqrt{n}$的收敛速度,而另一个干扰参数可具有任意复杂度甚至完全错误设定。该双重校准策略还可应用于处理效应估计以外的场景,例如在半参数部分线性模型中存在发散数量控制变量时的回归系数估计。