Anecdotally, using an estimated propensity score is superior to the true propensity score in estimating the average treatment effect based on observational data. However, this claim comes with several qualifications: it holds only if propensity score model is correctly specified and the number of covariates $d$ is small relative to the sample size $n$. We revisit this phenomenon by studying the inverse propensity score weighting (IPW) estimator based on a logistic model with a diverging number of covariates. We first show that the IPW estimator based on the estimated propensity score is consistent and asymptotically normal with smaller variance than the oracle IPW estimator (using the true propensity score) if and only if $n \gtrsim d^2$. We then propose a debiased IPW estimator that achieves the same guarantees in the regime $n \gtrsim d^{3/2}$. Our proofs rely on a novel non-asymptotic decomposition of the IPW error along with careful control of the higher order terms.
翻译:经验表明,在基于观测数据估计平均处理效应时,使用估计倾向得分优于真实倾向得分。然而,这一结论存在若干限定条件:仅在倾向得分模型正确设定且协变量数量d相对于样本量n较小时成立。我们通过研究基于逻辑回归模型且协变量数量发散时的逆概率加权(IPW)估计量,重新审视了这一现象。首先证明,当且仅当n ≳ d²时,基于估计倾向得分的IPW估计量较之使用真实倾向得分的预言机IPW估计量具有更小方差且满足相合性与渐近正态性。随后,我们提出一种去偏IPW估计量,在n ≳ d^{3/2}的区间内能达成相同保证。我们的证明依赖于对IPW误差的新型非渐近分解以及对高阶项的精细控制。