Out-of-Bag (OOB) estimation is the standard internal diagnostic for bootstrap-aggregated tree ensembles. Under the classical multinomial bootstrap, the number of distinct training observations in each replicate, $U_b$, is itself random, but its contribution to OOB-based variability has rarely been isolated empirically. We use Sequential Bootstrap (SB) -- a resampling scheme that holds $U_b$ at a fixed target $k_n = \lfloor 0.632 n\rfloor$ -- as a controlled perturbation of the bootstrap mechanism, and ask whether stabilizing $U_b$ produces any measurable change in OOB-based diagnostics. We reproduce Breiman's five OOB experimental families on twelve synthetic and real datasets, but unlike the three-seed presentation common in this literature, we run 100 independent random seeds with 50 internal replications per seed, enabling formal paired statistical comparison (Wilcoxon signed-rank, paired-$t$, Pitman--Morgan variance test). We report three findings. First, OOB means are essentially insensitive to stabilization of $U_b$: of 57 (experiment, dataset, metric) cells under 100 seeds, only 6 reach $p<0.05$ on the paired mean comparison, and 4 of those 6 point in the opposite direction from what a 3-seed reading would suggest. Second, a narrow but reproducible effect survives at the variance level: SB reduces the cross-seed standard deviation of node-level classification diagnostics on real datasets while slightly increasing it on synthetic ones (permutation $p=0.026$); the Vehicle dataset exhibits a 21% cross-seed sd reduction (Pitman--Morgan $p=0.017$). Third, several directional claims that appear stable across three seeds flip sign under 100-seed replication, illustrating the cost of underpowered replication protocols. We therefore treat SB as a diagnostic tool for probing the distinct-sample-count term in the variance of OOB estimators, not as an alternative to the classical bootstrap.
翻译:袋外(OOB)估计是自助聚合树型集成模型中标准的内部诊断方法。在经典多项自助法下,每个重复样本中不同训练观测值的数量$U_b$本身是随机的,但其对基于OOB的变异性的贡献很少被实证分离。我们使用序贯自助法(SB)——一种将$U_b$固定为目标值$k_n = \lfloor 0.632 n\rfloor$的重抽样方案——作为对自助机制的受控扰动,并探究稳定化$U_b$是否会在基于OOB的诊断中产生可测量的变化。我们在12个合成和真实数据集上复现了Breiman的五类OOB实验,但与这类文献中常见的三种随机种子呈现方式不同,我们运行了100个独立的随机种子,每个种子进行50次内部重复,从而能够进行正式的配对统计比较(Wilcoxon符号秩检验、配对t检验、Pitman–Morgan方差检验)。我们报告三项发现。首先,OOB均值对$U_b$的稳定化基本不敏感:在100种随机种子下的57个(实验、数据集、指标)单元中,仅有6个在配对均值比较中达到$p<0.05$,且其中4个的方向与基于三种随机种子的解读所预示的相反。第二,在方差层面存在一个狭窄但可重复的效应:SB降低了真实数据集上节点级分类诊断的跨种子标准差,而在合成数据集上略有增加(置换检验$p=0.026$);Vehicle数据集展现出21%的跨种子标准差降低(Pitman–Morgan检验$p=0.017$)。第三,若干在三种随机种子下看似稳定的方向性结论在100种随机种子重复中符号翻转,这揭示了低统计功效重复实验方案的代价。因此,我们将SB视为一种诊断工具,用于探析OOB估计量方差中的不同样本计数项,而非经典自助法的替代方案。