There has been a resurgence of interest in the asymptotic normality of incomplete U-statistics that only sum over roughly as many kernel evaluations as there are data samples, due to its computational efficiency and usefulness in quantifying the uncertainty for ensemble-based predictions. In this paper, we focus on the normal convergence of one such construction, the incomplete U-statistic with Bernoulli sampling, based on a raw sample of size $n$ and a computational budget $N$ in the same order as $n$. Under a minimalistic third moment assumption on the kernel, we offer an accompanying Berry-Esseen bound of the natural rate $1/\sqrt{\min(N, n)}$ that characterizes the normal approximating accuracy involved. Our key techniques include Stein's method specialized for the so-called Studentized nonlinear statistics, and an exponential lower tail bound for non-negative kernel U-statistics.
翻译:由于其在计算效率上的优势以及在量化基于集成预测的不确定性方面的实用性,仅需计算与数据样本量相当核函数估值的不完全U统计量的渐近正态性近期重新受到关注。本文重点研究其中一种构造——基于规模为$n$的原始样本且计算预算$N$与$n$同阶的伯努利抽样不完全U统计量——的正态收敛性。在对核函数的三阶矩进行最小化假设的前提下,我们给出了具有自然收敛速率$1/\sqrt{\min(N, n)}$的伴随Berry-Esseen界,该界刻画了所涉及正态逼近的精度。我们的核心技术包括专门针对所谓学生化非线性统计量的Stein方法,以及非负核U统计量的指数下尾界。