The global uniform aggregation of random forests leaves conditional bias along the decision boundary uncorrected. To correct this locally, we propose exploiting the structural pattern of each tree's decision path. At inference, a random forest reaches its prediction through the root-to-leaf path the sample traverses in each tree, so path-level reliability offers a finer granularity than tree-level weighting can access. We show that reliability varies meaningfully across path patterns in the boundary region identified by the forest itself, and that using this signal yields a statistically significant accuracy improvement over RF on 36 binary classification benchmarks (Wilcoxon p < 0.0001). We further devise a way to measure the sufficiency of residual information in the fitted RF's decision boundary, providing an estimate of the expected gain before the method is applied; on the qualifying group identified this way, the method delivers a mean +0.99 pp accuracy improvement with strict wins on every dataset (7/0/0). Class-recall regression -- the typical failure mode of RF correction methods -- is measured: zero minority-recall regressions and a single majority-recall regression at the 0.2 pp threshold, indicating that the correction operates in the direction of bias reduction rather than class trade-off. Our work suggests that the structural information of decision paths, previously overlooked in random forest research, can contribute to RF performance improvement.
翻译:随机森林的全局均匀聚合方式无法修正决策边界上存在的条件偏差。为此,我们提出利用每棵树决策路径的结构模式进行局部修正。在推理阶段,随机森林通过样本在每棵树中从根节点到叶节点的路径达成预测,因此路径级可靠性比树级权重能提供更精细的粒度。我们证明,在森林自身识别的边界区域内,不同路径模式的可靠性存在显著差异,且利用该信号可在36个二分类基准测试中较随机森林获得统计显著的精度提升(Wilcoxon检验p<0.0001)。我们进一步设计了一种方法,用于衡量已拟合随机森林决策边界中残余信息的充分性,从而在方法应用前预估预期收益:在以此方式识别的合格组中,该方法平均提升0.99个百分点精度,且在每个数据集上均实现严格胜出(7胜0平0负)。对类别召回率回归——随机森林修正方法的典型失效模式——进行了测量:在0.2个百分点阈值下,未出现少数类召回率回归,仅出现单次多数类召回率回归,表明该修正沿着偏差减少而非类别权衡的方向运行。我们的研究表明,此前在随机森林研究中被忽视的决策路径结构信息,能够有效提升随机森林性能。