Treatment effect estimation under unconfoundedness is a fundamental task in causal inference. In response to the challenge of analyzing high-dimensional datasets collected in substantive fields such as epidemiology, genetics, economics, and social sciences, various methods for treatment effect estimation with high-dimensional nuisance parameters (the outcome regression and the propensity score) have been developed in recent years. However, it is still unclear what is the necessary and sufficient sparsity condition on the nuisance parameters such that we can estimate the treatment effect at $1 / \sqrt{n}$-rate. In this paper, we propose a new Double-Calibration strategy that corrects the estimation bias of the nuisance parameter estimates computed by regularized high-dimensional techniques and demonstrate that the corresponding Doubly-Calibrated estimator achieves $1 / \sqrt{n}$-rate as long as one of the nuisance parameters is sparse with sparsity below $\sqrt{n} / \log p$, where $p$ denotes the ambient dimension of the covariates, whereas the other nuisance parameter can be arbitrarily complex and completely misspecified. The Double-Calibration strategy can also be applied to settings other than treatment effect estimation, e.g. regression coefficient estimation in the presence of a diverging number of controls in a semiparametric partially linear model, and local average treatment effect estimation with instrumental variables.
翻译:在无混杂条件下估计处理效应是因果推断中的一项基本任务。针对流行病学、遗传学、经济学和社会科学等实质性领域收集的高维数据集分析挑战,近年来已发展出多种处理高维干扰参数(结果回归与倾向得分)的处理效应估计方法。然而,干扰参数需要满足何种必要且充分的稀疏性条件才能实现处理效应的$1 / \sqrt{n}$速率估计,目前仍不明确。本文提出一种新的双重校准策略,该策略能够修正通过正则化高维技术计算的干扰参数估计的估计偏差,并证明相应的双重校准估计量只要其中一个干扰参数的稀疏度低于$\sqrt{n} / \log p$(其中$p$表示协变量的环境维度)即可达到$1 / \sqrt{n}$收敛速率,而另一个干扰参数可以任意复杂甚至完全错误设定。该双重校准策略还可应用于处理效应估计之外的其他场景,例如半参数部分线性模型中存在发散数量控制变量时的回归系数估计,以及使用工具变量的局部平均处理效应估计。