Super Ensemble Learning Using the Highly-Adaptive-Lasso

We consider estimation of a functional parameter of a realistically modeled data distribution based on independent and identically distributed observations. Suppose that the true function is defined as the minimizer of the expectation of a specified loss function over its parameter space. Estimators of the true function are provided, viewed as a data-adaptive coordinate transformation for the true function. For any $J$-dimensional real valued cadlag function with finite sectional variation norm, we define a candidate ensemble estimator as the mapping from the data into the composition of the cadlag function and the $J$ estimated functions. Using $V$-fold cross-validation, we define the cross-validated empirical risk of each cadlag function specific ensemble estimator. We then define the Meta Highly Adaptive Lasso Minimum Loss Estimator (M-HAL-MLE) as the cadlag function that minimizes this cross-validated empirical risk over all cadlag functions with a uniform bound on the sectional variation norm. For each of the $V$ training samples, this yields a composition of the M-HAL-MLE ensemble and the $J$ estimated functions trained on the training sample. We can estimate the true function with the average of these $V$ estimated functions, which we call the M-HAL super-learner. The M-HAL super-learner converges to the oracle estimator at a rate $n^{-2/3}$ (up till $\log n$-factor) w.r.t. excess risk, where the oracle estimator minimizes the excess risk among all considered ensembles. The excess risk of the oracle estimator and true function is generally second order. Under weak conditions on the $J$ candidate estimators, target features of the undersmoothed M-HAL super-learner are asymptotically linear estimators of the corresponding target features of true function, with influence curve either the efficient influence curve, or potentially, a super-efficient influence curve.

翻译：我们考虑基于独立同分布观测对实际建模数据分布的函数参数进行估计。假设真实函数被定义为在其参数空间上指定损失函数期望的最小化者。我们提供了真实函数的估计量，这些估计量被视为真实函数的数据自适应坐标变换。对于任意具有有穷截面变差范数的$J$维实值cadlag函数，我们将候选集成估计量定义为从数据到该cadlag函数与$J$个估计函数复合结果的映射。利用$V$折交叉验证，我们定义了每个cadlag函数特定集成估计量的交叉验证经验风险。接着，我们将元高度自适应Lasso最小损失估计量（M-HAL-MLE）定义为在所有截面变差范数具有一致有界的cadlag函数中最小化该交叉验证经验风险的函数。对于$V$个训练样本中的每一个，这产生了M-HAL-MLE集成与基于训练样本训练的$J$个估计函数的复合。我们可以通过这$V$个估计函数的平均值来估计真实函数，我们称之为M-HAL超级学习器。M-HAL超级学习器以$n^{-2/3}$的速率（忽略$\log n$因子）在过量风险意义上收敛到先知估计量，其中先知估计量在所有考虑的集成中最小化过量风险。先知估计量与真实函数之间的过量风险通常是二阶的。在$J$个候选估计量的弱条件下，平滑不足的M-HAL超级学习器的目标特征是关于真实函数相应目标特征的渐近线性估计量，其影响曲线要么是有效影响曲线，要么可能是超有效影响曲线。