Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stopping achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness. We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
翻译:梯度提升在大多数预测问题中表现优异,且可良好扩展至大规模数据集。本文证明,在具有有界截面变差的分段右连续函数的非参数大空间中,采用早停法的"套索化"梯度提升树算法可实现快于$n^{-1/4}$的L2收敛速度。这一速率令人瞩目,因其不依赖于维度、稀疏性或光滑性。我们通过模拟实验与真实数据验证了理论结果,并展示了与标准提升方法相当的经验性能与可扩展性。本文的收敛性证明基于一个关于嵌套Donsker类经验损失最小化早停法的全新通用定理。