Estimating the conditional mean function is a central task in statistical learning. In this paper, we consider estimation and inference for a nonparametric class of real-valued cadlag functions with bounded sectional variation (Gill et al., 1995), using the Highly Adaptive Lasso (HAL) (van der Laan, 2015; Benkeser and van der Laan, 2016; van der Laan, 2023), a flexible empirical risk minimizer over linear combinations of tensor products of zero- or higher-order spline basis functions under an L1 norm constraint. Building on recent theoretical advances in asymptotic normality and uniform convergence rates for higher-order spline HAL estimators, this work focuses on constructing robust confidence intervals for HAL-based estimators of conditional means. First, we propose a targeted HAL with a debiasing step to remove the regularization bias of the targeted conditional mean and also consider a relaxed HAL estimator to reduce such bias within the working model. Second, we propose both global and local undersmoothing strategies to adaptively enlarge the working model and further reduce bias relative to variance. Third, we combine these estimation strategies with delta-method-based variance estimators to construct confidence intervals for the conditional mean. Through extensive simulation studies, we evaluate different combinations of our estimation procedures, model selection strategies, and confidence-interval constructions. The results show that our proposed approaches substantially reduce bias relative to variance and yield confidence intervals with coverage rates close to nominal levels across different scenarios. Finally, we demonstrate the general applicability of our framework by estimating conditional average treatment effect (CATE) functions, highlighting how HAL-based inference methods extend to other infinite-dimensional, non-pathwise-differentiable parameters.
翻译:条件均值函数的估计是统计学习的核心任务。本文考虑使用高适应性套索(HAL)——一种在L1范数约束下基于零阶或高阶样条基函数张量积线性组合的灵活经验风险最小化器——对有界截面变差实值右连左极函数非参数类进行估计与推断。基于高阶样条HAL估计量渐近正态性与一致收敛速率的最新理论进展,本研究聚焦于为基于HAL的条件均值估计量构建稳健置信区间。首先,我们提出带去偏步骤的定向HAL以消除目标条件均值的正则化偏差,同时考虑松弛HAL估计量以在工作模型内降低此类偏差。其次,我们提出全局与局部欠平滑策略来自适应扩展工作模型,进一步降低偏差相对于方差的比重。第三,我们将这些估计策略与基于Delta方法的方差估计量相结合,构建条件均值的置信区间。通过大量模拟研究,我们评估了不同估计流程、模型选择策略与置信区间构建方法的组合效果。结果表明,所提方法显著降低了偏差相对于方差的比重,在不同场景下产生的置信区间覆盖率接近名义水平。最后,我们通过估计条件平均处理效应函数展示了该框架的普适性,凸显了基于HAL的推断方法如何推广至其他无限维、非路径可微参数。