Regression trees are among the most interpretable yet expressive model classes in machine learning. Historically, greedy induction has been the dominant approach for constructing well-performing regression trees. While optimal methods based on dynamic programming and branch-and-bound exist, they are computationally prohibitive for general linear regression trees, despite often achieving substantially better performance than greedy approaches. Recent work has shown that specialized lookahead strategies can dramatically improve runtime while maintaining near-optimal performance, primarily in classification settings. In this work, we develop a novel algorithm for near-optimal, sparse, piecewise linear regression trees that combines a lookahead-style search strategy with efficient rank-one Cholesky updates of the Gram matrix. We demonstrate, both theoretically and empirically, that our method achieves a favorable trade-off between computational efficiency, predictive accuracy, and sparsity, and scales significantly better than the current state of the art.
翻译:回归树是机器学习中兼具最强可解释性与表达能力的一类模型。历史上,贪心归纳法一直是构建高性能回归树的主流方法。尽管基于动态规划和分支定界的最优方法存在,但对于一般的线性回归树而言,这些方法计算开销巨大,尽管其性能往往显著优于贪心方法。近期研究表明,专用前瞻策略能在保持近最优性能的前提下大幅提升运行时间,该优势主要体现在分类场景中。本文提出了一种面向近最优稀疏分段线性回归树的新算法,该算法融合了前瞻式搜索策略与Gram矩阵的高效秩一Cholesky更新机制。我们从理论与实验两个层面证明,本方法在计算效率、预测精度与稀疏性之间取得了优越的平衡,且扩展性显著优于当前最优方法。