We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning (AutoDML). These popular doubly robust or double machine learning estimators combine outcome modeling with balancing weights -- weights that achieve covariate balance directly in lieu of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the coefficients from the original outcome model coefficients and coefficients from an unpenalized ordinary least squares (OLS) fit on the same data; in many real-world applications the augmented estimator collapses to the OLS estimate alone. We then extend these results to specific choices of outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression. This holds numerically in finite samples and lays the groundwork for a novel analysis of undersmoothing and asymptotic rates of convergence. When the weighting model is instead lasso-penalized regression, we give closed-form expressions for special cases and demonstrate a ``double selection'' property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.
翻译:我们提出了一种对增强型平衡权重(也称为自动去偏机器学习,AutoDML)的新刻画。这些广受欢迎的双重稳健或双重机器学习估计量将结果建模与平衡权重相结合——这些权重直接实现协变量平衡,而无需估计和反转倾向性得分。当结果模型和加权模型在某个(可能无限维的)基函数下均为线性时,我们证明增强型估计量等价于一个单一的线性模型,其系数由原始结果模型系数与在同一数据上进行无惩罚普通最小二乘(OLS)拟合的系数组合而成;在许多实际应用中,增强型估计量退化为单独的OLS估计。随后,我们将这些结果推广到特定的结果模型和加权模型选择。首先,我们证明,当使用(核)岭回归同时构建结果模型和加权模型时,增强型估计量等价于一个单一的欠光滑(核)岭回归模型。这一结论在有限样本中数值成立,并为欠光滑性与渐近收敛率的新分析奠定了基础。当加权模型换为套索惩罚回归时,我们给出了特殊情形下的闭式表达式,并展示了“双重选择”性质。我们的框架打开了这一日益流行的估计量类别的黑箱,弥合了现有关于欠光滑与双重稳健估计量半参数效率结果之间的差距,并为增强型平衡权重的性能提供了新见解。