The kernel two-sample test based on the maximum mean discrepancy (MMD) is one of the most popular methods for detecting differences between two distributions over general metric spaces. In this paper we propose a method to boost the power of the kernel test by combining MMD estimates over multiple kernels using their Mahalanobis distance. We derive the asymptotic null distribution of the proposed test statistic and use a multiplier bootstrap approach to efficiently compute the rejection region. The resulting test is universally consistent and, since it is obtained by aggregating over a collection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples. We also derive the distribution of the test statistic for both fixed and local contiguous alternatives. The latter, in particular, implies that the proposed test is statistically efficient, that is, it has non-trivial asymptotic (Pitman) efficiency. Extensive numerical experiments are performed on both synthetic and real-world datasets to illustrate the efficacy of the proposed method over single kernel tests. Our asymptotic results rely on deriving the joint distribution of MMD estimates using the framework of multiple stochastic integrals, which is more broadly useful, specifically, in understanding the efficiency properties of recently proposed adaptive MMD tests based on kernel aggregation.
翻译:基于最大均值差异(MMD)的核双样本检验是检测一般度量空间上两个分布差异的最流行方法之一。本文提出一种通过使用多个核的MMD估计的马氏距离将它们组合起来,以增强核检验功效的方法。我们推导了所提出检验统计量的渐近零分布,并采用乘子自助法高效计算拒绝域。该检验具有全局一致性,且由于是通过聚合一组核/带宽而获得,因此在有限样本中检测各类备择假设时更具功效。我们还推导了检验统计量在固定备择假设和局部邻近备择假设下的分布。后者特别表明所提出的检验是统计有效的,即具有非平凡渐近(皮特曼)效率。通过在合成数据集和真实数据集上进行大量数值实验,验证了该方法相对于单核检验的有效性。我们的渐近结果依赖于使用多重随机积分框架推导MMD估计的联合分布,该框架具有更广泛的用途,特别是在理解最近提出的基于核聚合的自适应MMD检验的效率性质方面。