We consider estimation of average treatment effects given observational data with high-dimensional pretreatment variables. Existing methods for this problem typically assume some form of sparsity for the regression functions. In this work, we introduce a debiased inverse propensity score weighting (DIPW) scheme for average treatment effect estimation that delivers $\sqrt{n}$-consistent estimates when the propensity score follows a sparse logistic regression model; the outcome regression functions are permitted to be arbitrarily complex. We further demonstrate how confidence intervals centred on our estimates may be constructed. Our theoretical results quantify the price to pay for permitting the regression functions to be unestimable, which shows up as an inflation of the variance of the estimator compared to the semiparametric efficient variance by a constant factor, under mild conditions. We also show that when outcome regressions can be estimated faster than a slow $1/\sqrt{ \log n}$ rate, our estimator achieves semiparametric efficiency. As our results accommodate arbitrary outcome regression functions, averages of transformed responses under each treatment may also be estimated at the $\sqrt{n}$ rate. Thus, for example, the variances of the potential outcomes may be estimated. We discuss extensions to estimating linear projections of the heterogeneous treatment effect function and explain how propensity score models with more general link functions may be handled within our framework. An R package \texttt{dipw} implementing our methodology is available on CRAN.
翻译:我们考虑在存在高维预处理变量的观测数据中估计平均处理效应的问题。现有方法通常假设回归函数具有某种稀疏性。本文引入了一种去偏逆倾向得分加权(DIPW)方案,用于平均处理效应估计,当倾向得分遵循稀疏逻辑回归模型时,该方法可得到√n-一致的估计量;而结果回归函数允许任意复杂。我们进一步展示了如何构建基于估计值的置信区间。理论结果量化了允许回归函数不可估所需付出的代价,在温和条件下,表现为估计量方差相对于半参数有效方差常数倍膨胀。同时证明,当结果回归能以快于1/√log n的速率估计时,我们的估计量达到半参数有效性。由于结果回归函数可任意复杂,各处理下变换响应的均值也可按√n速率估计,从而可估计潜在结果的方差。我们还讨论了异质性处理效应函数线性投影的估计扩展,并解释了如何在此框架内处理具有更一般链接函数的倾向得分模型。实现该方法的R包dipw现已发布于CRAN。