Kernel methods provide a flexible and powerful framework for nonparametric statistical testing by embedding probability distributions into a reproducing kernel Hilbert space (RKHS). In this work, we study the kernel two-sample testing problem and focus on a normalized version of the Maximum Mean Discrepancy (MMD) as a test statistic, which scales the discrepancy by the within-group covariance operator to account for data variability. This normalization has been shown to improve test power in both theoretical and empirical settings. Because this normalization requires regularization, we study the non-asymptotic properties of the spectrally truncated normalized MMD (st-nMMD) and derive an exponential upper bound under the null hypothesis. Thanks to this result we propose a sharp and explicit upper bound for the corresponding non-asymptotic quantile, along with a data-adaptive estimator. We further propose an algorithm to tune the hyperparameters involved in the quantile estimation, including the truncation level, without requiring data splitting. We demonstrate the performance of the st-nMMD through numerical experiments under both the null and alternative hypotheses.
翻译:核方法通过将概率分布嵌入再生核希尔伯特空间(RKHS),为非参数统计检验提供了灵活且强大的框架。本文研究核两样本检验问题,并聚焦于标准化版本的最大均值差异(MMD)作为检验统计量,该统计量通过组内协方差算子缩放差异量以考虑数据变异性。理论与实证研究均表明,这种标准化能提升检验功效。由于标准化需要正则化处理,我们研究了谱截断标准化MMD(st-nMMD)的非渐近性质,并推导了其在原假设下的指数上界。基于该结果,我们提出了相应非渐近分位数的显式锐利上界及其数据自适应估计量。进一步地,我们设计了一种无需数据分割即可调优分位数估计中涉及的超参数(包括截断水平)的算法。通过原假设与备择假设下的数值实验,我们验证了st-nMMD的性能表现。