There has been a resurgence of interest in the asymptotic normality of incomplete U-statistics that only sum over roughly as many kernel evaluations as there are data samples, due to its computational efficiency and usefulness in quantifying the uncertainty for ensemble-based predictions. In this paper, we focus on the normal convergence of one such construction, the incomplete U-statistic with Bernoulli sampling, based on a raw sample of size $n$ and a computational budget $N$. Under minimalistic moment assumptions on the kernel, we offer accompanying Berry-Esseen bounds of the natural rate $1/\sqrt{\min(N, n)}$ that characterize the normal approximating accuracy involved when $n \asymp N$, i.e. $n$ and $N$ are of the same order in such a way that $n/N$ is lower-and-upper bounded by constants. Our key techniques include Stein's method specialized for the so-called Studentized nonlinear statistics, and an exponential lower tail bound for non-negative kernel U-statistics.
翻译:近年来,由于计算效率高且在量化集成预测不确定性方面具有实用价值,仅需计算与数据样本量相当核函数估值的不完全U统计量的渐近正态性重新受到关注。本文聚焦于其中一种构造——基于规模为$n$的原始样本与计算预算$N$的伯努利采样不完全U-statistic——的正态收敛性。在对核函数极简矩假设条件下,我们给出了具有自然收敛阶$1/\sqrt{\min(N, n)}$的Berry-Esseen界,该界刻画了当$n \asymp N$(即$n$与$N$同阶且$n/N$被常数上下界控制)时所涉及的正态逼近精度。我们的核心技术包括专用于所谓学生化非线性统计量的Stein方法,以及非负核U统计量的指数下尾界。