Adaptive debiased machine learning using data-driven model selection techniques

Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models.

翻译：去偏机器学习估计量用于非参数推断数据生成分布的平滑泛函时，可能遭受过度变异性和不稳定性。为此，实践者常采用基于参数或半参数假设的简化模型。然而，此类简化假设可能不成立，此时因模型误设导致估计量产生偏差。为应对该问题，我们提出自适应去偏机器学习（ADML）——一种将数据驱动模型选择与去偏机器学习技术相结合的非参数框架，用于构造路径可微泛函的渐近线性、自适应且超有效的估计量。通过直接从数据中学习模型结构，ADML避免了模型误设引入的偏差，且不受参数模型与半参数模型的限制。尽管在非参数统计模型中，目标参数可能呈现非正则行为，我们证明ADML估计量可为基于投影的预言参数提供正则且局部一致有效的推断。重要的是，该预言参数与未知但被正确指定的预言统计子模型中的原始目标参数一致，且该子模型从数据中学习得到。这一发现表明，在局部渐近意义上，进行数据驱动模型选择与事先了解预言子模型及预言参数相比，不存在惩罚。为展示理论的实际适用性，我们提供了一类广泛的ADML估计量，用于自适应部分线性回归模型中的平均处理效应估计。