Random forests are a popular class of algorithms used for regression and classification. The algorithm introduced by Breiman in 2001 and many of its variants are ensembles of randomized decision trees built from axis-aligned partitions of the feature space. One such variant, called Mondrian forests, was proposed to handle the online setting and is the first class of random forests for which minimax rates were obtained in arbitrary dimension. However, the restriction to axis-aligned splits fails to capture dependencies between features, and random forests that use oblique splits have shown improved empirical performance for many tasks. In this work, we show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension. This class includes STIT forests, a generalization of Mondrian forests to arbitrary split directions, as well as random forests derived from Poisson hyperplane tessellations. These are the first results showing that random forest variants with oblique splits can obtain minimax optimality in arbitrary dimension. Our proof technique relies on the novel application of the theory of stationary random tessellations in stochastic geometry to statistical learning theory.
翻译:随机森林是一类用于回归和分类的流行算法。Breiman于2001年提出的算法及其众多变体,是基于特征空间轴对齐划分的随机化决策树的集成。其中一种变体——Mondrian森林——被提出用于处理在线场景,并且是首个在任意维度上获得极小化最优速率的随机森林类别。然而,对轴对齐分割的限制未能捕捉特征间的依赖关系,而采用斜切分割的随机森林在众多任务中展现出更优的经验性能。在本研究中,我们证明了具有通用分割方向的大类随机森林同样能在任意维度上实现极小化最优收敛速率。该类别包括STIT森林(一种将Mondrian森林推广至任意分割方向的变体),以及源自泊松超平面镶嵌的随机森林。这是首个证明斜切分割随机森林变体在任意维度上能达到极小化最优性的结果。我们的证明技术依赖于将随机几何中平稳随机镶嵌理论创新性地应用于统计学习理论。