In this paper, we propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Rather than relying on the widely-used random feature selection, we propose a permutation-based balanced splitting criterion. The adaptive split balancing forest (ASBF), achieves minimax optimality under the Lipschitz class. Its localized version, which fits local regressions at the leaf level, attains the minimax rate under the broad H\"older class $\mathcal{H}^{q,\beta}$ of problems for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. We identify that over-reliance on auxiliary randomness in tree construction may compromise the approximation power of trees, leading to suboptimal results. Conversely, the proposed less random, permutation-based approach demonstrates optimality over a wide range of models. Although random forests are known to perform well empirically, their theoretical convergence rates are slow. Simplified versions that construct trees without data dependence offer faster rates but lack adaptability during tree growth. Our proposed method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data. Additionally, we establish uniform upper bounds and demonstrate that ASBF improves dimensionality dependence in average treatment effect estimation problems. Simulation studies and real-world applications demonstrate our methods' superior performance over existing random forests.
翻译:本文提出一种新的随机森林算法,该算法采用新颖的自适应分割平衡方法构建决策树。与广泛使用的随机特征选择方法不同,我们提出了一种基于置换的平衡分割准则。自适应分割平衡森林(ASBF)在Lipschitz函数类下达到极小极大最优性。其局部化版本在叶节点层面拟合局部回归,可在任意$q\in\mathbb{N}$和$\beta\in(0,1]$参数下的广泛H\"older问题类$\mathcal{H}^{q,\beta}$中获得极小极大收敛速率。我们发现,在树构建过程中过度依赖辅助随机性可能会损害树的近似能力,导致次优结果。相反,所提出的基于置换的低随机性方法在广泛模型范围内展现出最优性。尽管随机森林在实际应用中表现良好,但其理论收敛速率较慢。那些不依赖数据构建决策树的简化版本虽具有更快收敛速率,却缺乏树生长过程中的自适应性。我们提出的方法在简单平滑场景中达到最优性,同时能从数据中自适应学习树结构。此外,我们建立了均匀上界,并证明ASBF在平均处理效应估计问题中改善了维度依赖性。仿真研究与实际应用均表明,本方法优于现有随机森林算法。