Infinite-order U-statistics (IOUS) has been used extensively on subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation approaches and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large or the sample size is small. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, especially the ratio consistency, have never been studied. These limitations lead to unguaranteed performances of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator's finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and archive targeted coverage rates.
翻译:无穷阶U统计量(Infinite-order U-statistics, IOUS)已广泛应用于子装袋集成学习算法(如随机森林)的不确定性量化。尽管IOUS的正态性结果已得到充分研究,但其方差估计方法及理论性质仍鲜有探索。现有方法主要利用霍夫丁分解中的主项主导性质,然而当核函数规模较大或样本量较小时,该视角通常会导致有偏估计。另一方面,尽管文献中存在若干无偏估计量,但其相互关系与理论性质(特别是比率一致性)从未被研究。这些局限性导致所构造置信区间的性能无法保证。为填补文献中的空白,我们提出一种新的霍夫丁分解视角用于方差估计,该视角可导出无偏估计量。与主项主导不同,我们的视角利用峰值区域的主导特性。此外,我们建立了所提估计量与多个现有无偏方差估计量之间的关联性与等价性。在理论上,我们首次证明了此类方差估计量的比率一致性,从而验证了随机森林构造置信区间的覆盖率。在数值实验中,我们进一步提出局部平滑过程以改善估计量的有限样本性能。大量仿真研究表明,我们的估计量具有更低的偏差并达到目标覆盖率。