We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
翻译:我们提出了一种新型统计量,通过自适应调整最大均值差异(MMD)中使用的核集合,最大化基于MMD的两样本检验效能。针对有限核集合,该方法通过加权软最大值将各核下的(归一化)MMD值进行组合。我们证明了所提统计量在原假设和备择假设下的指数浓度界,并进一步展示了如何在无需数据分割的前提下,以数据依赖但置换无关的方式选择这些核,构建一个经过良好校准的检验。该技术可更广泛地应用于基于置换的通用MMD检验,包括使用自编码器等无监督模型学习特征的深度核方法。我们突出了MMD-FUSE检验在合成低维数据与真实高维数据上的适用性,并对比了其与当前最先进核检验在检验效能方面的表现。