This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.
翻译:本文为 XGBoost 隐式学习的函数类建立了严格的理论基础,弥合了其经验成功与理论理解之间的差距。我们引入了一个无限维函数类 $\mathcal{F}^{d, s}_{\infty-\text{ST}}$,它扩展了有限的有界深度回归树集成,并引入了一个复杂度度量 $V^{d, s}_{\infty-\text{XGB}}(\cdot)$,该度量推广了 XGBoost 中使用的 $L^1$ 正则化惩罚项。我们证明,XGBoost 目标函数的每个优化器,也是具有惩罚项 $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ 的、在 $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ 上的等价惩罚回归问题的优化器,从而将 XGBoost 解释为隐式地瞄准了一个更广泛的函数类。我们还基于光滑性,利用 Hardy--Krause 变差对 $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ 和 $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ 给出了解释。我们证明了在集合 $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ 上的最小二乘估计器达到了近乎极小极大最优的收敛速率 $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$,从而避免了维度灾难。我们的结果首次对 XGBoost 背后的函数空间进行了严格的刻画,阐明了其与经典变差概念的联系,并指出了一个重要的开放性问题:XGBoost 算法本身是否能在该类上达到极小极大最优性。