We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
翻译:我们提出新型统计量,通过自适应调整最大均值差异(MMD)定义中使用的核集合,最大化双样本检验的效力。针对有限核集合,该方法简化为通过加权软最大值组合各核下的(归一化)MMD值。我们证明了所提出统计量在原假设和备择假设下的指数浓度界。进一步展示了如何以数据依赖但排列无关的方式选择这些核,从而在良好校准的检验中避免数据分割。该技术可广泛适用于基于排列的通用MMD检验,并涵盖使用自编码器等无监督模型学习特征的深度核。我们重点展示了MMD-FUSE检验在低维合成数据与高维真实数据上的适用性,并与当前最先进的核检验方法进行了效力对比。