We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.
翻译:我们证明了关于二阶U统计量的一个收敛定理,其中数据维度$d$允许随样本量$n$缩放。我们发现,无论U统计量是否退化,其极限分布都会经历一个从非退化高斯极限到退化极限的相变,并且该相变仅取决于一个矩比值。一个令人惊讶的结论是,高维度下的非退化U统计量可能具有非高斯极限,并伴随更大的方差和非对称分布。我们的界限对任意有限的$n$和$d$均成立,与底层函数的个体特征值无关,并在一个温和假设下与维度无关。作为一个应用,我们将该理论应用于两种流行的基于核的分布检验——MMD和KSD,这两种方法在高维下的性能研究一直颇具挑战。在简单的实证环境下,我们的结果正确预测了固定阈值下检验功效如何随$d$和带宽变化。