Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.
翻译:受核基度量在高维大规模数据中日益广泛应用的启发,本研究探讨了当数据维数与样本量同时趋于无穷时,核双样本检验的渐近性质。我们聚焦于采用各向同性核的最大均值差异(MMD),其特例包括使用高斯核与拉普拉斯核的MMD以及能量距离。通过推导核双样本统计量的渐近展开式,我们在原假设、局部备择假设及固定备择假设下建立了中心极限定理(CLT)。新的非原假设CLT结果使我们能够进行渐近精确功效分析,从而揭示了核双样本检验可检测的矩差异与维数样本阶数之间微妙的交互关系。数值研究进一步验证了该渐近理论。