While globally optimal empirical risk minimization (ERM) decision trees have become computationally feasible and empirically successful, rigorous theoretical guarantees for their statistical performance remain limited. In this work, we develop a comprehensive statistical theory for ERM trees under random design in both high-dimensional regression and classification. We first establish sharp oracle inequalities that bound the excess risk of the ERM estimator relative to the best possible approximation achievable by any tree with at most $L$ leaves, thereby characterizing the interpretability-accuracy trade-off. We derive these results using a novel uniform concentration framework based on empirically localized Rademacher complexity. Furthermore, we derive minimax optimal rates over a novel function class: the piecewise sparse heterogeneous anisotropic Besov (PSHAB) space. This space explicitly captures three key structural features encountered in practice: sparsity, anisotropic smoothness, and spatial heterogeneity. While our main results are established under sub-Gaussianity, we also provide robust guarantees that hold under heavy-tailed noise settings. Together, these findings provide a principled foundation for the optimality of ERM trees and introduce empirical process tools broadly applicable to other highly adaptive, data-driven procedures.
翻译:尽管全局最优经验风险最小化(ERM)决策树在计算上已变得可行且在实证上取得了成功,但其统计性能的严格理论保证仍然有限。在本研究中,我们针对高维回归和分类中的随机设计,为ERM决策树建立了一套全面的统计理论。我们首先建立了尖锐的Oracle不等式,该不等式界定了ERM估计器相对于任何最多具有$L$个叶子的树所能达到的最佳近似值的超额风险,从而刻画了可解释性与准确性之间的权衡。我们利用一种基于经验局部化Rademacher复杂度的新型一致集中性框架推导了这些结果。此外,我们在一个新的函数类上推导了极小极大最优速率:分段稀疏异质各向异性Besov(PSHAB)空间。该空间明确捕捉了实践中遇到的三个关键结构特征:稀疏性、各向异性光滑性和空间异质性。虽然我们的主要结果是在亚高斯性假设下建立的,但我们也提供了在重尾噪声设置下仍然成立的稳健性保证。这些发现共同为ERM决策树的最优性提供了原则性基础,并引入了可广泛应用于其他高度自适应、数据驱动程序的实证过程工具。