Statistical Advantages of Oblique Randomized Decision Trees and Forests

This work studies the statistical advantages of using features comprised of general linear combinations of covariates to partition the data in randomized decision tree and forest regression algorithms. Using random tessellation theory in stochastic geometry, we provide a theoretical analysis of a class of efficiently generated random tree and forest estimators that allow for oblique splits along such features. We call these estimators oblique Mondrian trees and forests, as the trees are generated by first selecting a set of features from linear combinations of the covariates and then running a Mondrian process that hierarchically partitions the data along these features. Generalization error bounds and convergence rates are obtained for the flexible dimension reduction model class of ridge functions (also known as multi-index models), where the output is assumed to depend on a low dimensional relevant feature subspace of the input domain. The results highlight how the risk of these estimators depends on the choice of features and quantify how robust the risk is with respect to error in the estimation of relevant features. The asymptotic analysis also provides conditions on the selected features along which the data is split for these estimators to obtain minimax optimal rates of convergence with respect to the dimension of the relevant feature subspace. Additionally, a lower bound on the risk of axis-aligned Mondrian trees (where features are restricted to the set of covariates) is obtained proving that these estimators are suboptimal for these linear dimension reduction models in general, no matter how the distribution over the covariates used to divide the data at each tree node is weighted.

翻译：本研究探讨在随机决策树与森林回归算法中，采用协变量线性组合特征进行数据划分的统计优势。基于随机几何中的随机镶嵌理论，我们对一类可高效生成的随机树与森林估计器进行理论分析，该类估计器允许沿此类特征进行斜向划分。我们将这些估计器称为斜向Mondrian树与森林，其生成过程为：首先从协变量的线性组合中选取特征集，随后沿这些特征运行Mondrian过程进行层次化数据划分。针对岭函数（亦称多指标模型）这类灵活降维模型类——其输出假定依赖于输入域的低维相关特征子空间——我们获得了泛化误差界与收敛速率。研究结果揭示了这些估计器的风险如何依赖于特征选择，并量化了风险对相关特征估计误差的鲁棒性。渐近分析进一步给出了数据划分特征的选择条件，使这些估计器能够获得关于相关特征子空间维度的极小极大最优收敛速率。此外，研究获得了轴对齐Mondrian树（其特征被限制在协变量集合内）的风险下界，证明对于一般线性降维模型，无论各树节点划分数据时采用何种协变量分布加权方式，此类估计器均非最优。