Higher Order Spline Highly Adaptive Lasso Estimators of Functional Parameters: Pointwise Asymptotic Normality and Uniform Convergence Rates

We consider estimation of a functional of the data distribution based on i.i.d. observations. We assume the target function can be defined as the minimizer of the expectation of a loss function over a class of $d$-variate real valued cadlag functions that have finite sectional variation norm. For all $k=0,1,\ldots$, we define a $k$-th order smoothness class of functions as $d$-variate functions on the unit cube for which each of a sequentially defined $k$-th order Radon-Nikodym derivative w.r.t. Lebesgue measure is cadlag and of bounded variation. For a target function in this $k$-th order smoothness class we provide a representation of the target function as an infinite linear combination of tensor products of $\leq k$-th order spline basis functions indexed by a knot-point, where the lower (than $k$) order spline basis functions are used to represent the function at the $0$-edges. The $L_1$-norm of the coefficients represents the sum of the variation norms across all the $k$-th order derivatives, which is called the $k$-th order sectional variation norm of the target function. This generalizes the zero order spline representation of cadlag functions with bounded sectional variation norm to higher order smoothness classes. We use this $k$-th order spline representation of a function to define the $k$-th order spline sieve minimum loss estimator (MLE), Highly Adaptive Lasso (HAL) MLE, and Relax HAL-MLE. For first and higher order smoothness classes, in this article we analyze these three classes of estimators and establish pointwise asymptotic normality and uniform convergence at dimension free rate $n^{-k^*/(2k^*+1)}$ up till a power of $\log n$ depending on the dimension, where $k^*=k+1$, assuming appropriate undersmoothing is used in selecting the $L_1$-norm. We also establish asymptotic linearity of plug-in estimators of pathwise differentiable features of the target function.

翻译：我们基于独立同分布观测数据考虑数据分布泛函的估计。假设目标函数可定义为在具有有限截面变差范数的$d$维实值右连左极函数类上损失函数期望的最小化器。对任意$k=0,1,\ldots$，定义$k$阶光滑函数类为定义在单位立方体上的$d$维函数，其关于勒贝格测度连续定义的$k$阶拉东-尼科迪姆导数均为右连左极且有界变差。对于属于该$k$阶光滑类的目标函数，我们给出其表示为以节点为索引的$\leq k$阶样条基函数张量积无限线性组合的形式，其中低阶（低于$k$）样条基函数用于表示$0$边缘处的函数。系数的$L_1$范数代表所有$k$阶导数的变差范数之和，称为目标函数的$k$阶截面变差范数。这将对有界截面变差范数右连左极函数的零阶样条表示推广至高阶光滑类。利用函数的$k$阶样条表示，我们定义了$k$阶样条筛最小损失估计量（MLE）、高适应LASSO（HAL）MLE及松弛HAL-MLE。针对一阶及更高阶光滑类，本文分析这三类估计量并建立逐点渐近正态性以及关于维数无自由的一致收敛速率$n^{-k^*/(2k^*+1)}$（乘以$\log n$的幂次，幂次取决于维数），其中$k^*=k+1$，且假设在选择$L_1$范数时采用适当欠平滑。我们还建立了目标函数路径可微特征的插入估计量的渐近线性性。