In this paper, error estimates of classification Random Forests are quantitatively assessed. Based on the initial theoretical framework built by Bates et al. (2023), the true error rate and expected error rate are theoretically and empirically investigated in the context of a variety of error estimation methods common to Random Forests. We show that in the classification case, Random Forests' estimates of prediction error is closer on average to the true error rate instead of the average prediction error. This is opposite the findings of Bates et al. (2023) which are given for logistic regression. We further show that our result holds across different error estimation strategies such as cross-validation, bagging, and data splitting.
翻译:本文对分类随机森林的误差估计进行了定量评估。基于Bates等人(2023)建立的初始理论框架,我们在随机森林常用的多种误差估计方法背景下,从理论和实证两方面研究了真实错误率与期望错误率。我们证明,在分类问题中,随机森林对预测误差的估计平均上更接近真实错误率,而非平均预测误差。这一结论与Bates等人(2023)针对逻辑回归给出的发现相反。我们进一步证明,我们的结果在不同的误差估计策略(如交叉验证、袋装法和数据分割)中均成立。