While random forests are commonly used for regression problems, existing methods often lack adaptability in complex situations or lose optimality under simple, smooth scenarios. In this study, we introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data while simultaneously achieving minimax optimality under the Lipschitz class. To exploit higher-order smoothness levels, we further propose a localized version that attains the minimax rate under the H\"older class $\mathcal{H}^{q,\beta}$ for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. Rather than relying on the widely-used random feature selection, we consider a balanced modification to existing approaches. Our results indicate that an over-reliance on auxiliary randomness may compromise the approximation power of tree models, leading to suboptimal results. Conversely, a less random, more balanced approach demonstrates optimality. Additionally, we establish uniform upper bounds and explore the application of random forests in average treatment effect estimation problems. Through simulation studies and real-data applications, we demonstrate the superior empirical performance of the proposed methods over existing random forests.
翻译:虽然随机森林常用于回归问题,但现有方法在复杂情况下往往缺乏适应性,或在简单平滑场景下失去最优性。本研究提出自适应平衡划分森林(ASBF),该模型既能从数据中学习树表示,同时在Lipschitz类下达到极小化最优性。为利用高阶光滑性,我们进一步提出局部化版本,可在任意$q\in\mathbb{N}$和$\beta\in(0,1]$的Hölder类$\mathcal{H}^{q,\beta}$下达到极小化速率。与广泛使用的随机特征选择不同,我们考虑对现有方法进行平衡化修正。结果表明,对辅助随机性的过度依赖可能削弱树模型的逼近能力,导致次优结果;而降低随机性、增强平衡性的方法则展现最优性。此外,我们建立了均匀上界,并探索了随机森林在平均处理效应估计问题中的应用。通过模拟研究与实际数据应用,我们证明了所提方法相较现有随机森林具有更优的实证表现。