The k-sample testing problem involves determining whether $k$ groups of data points are each drawn from the same distribution. The standard method for k-sample testing in biomedicine is Multivariate analysis of variance (MANOVA), despite that it depends on strong, and often unsuitable, parametric assumptions. Moreover, independence testing and k-sample testing are closely related, and several universally consistent high-dimensional independence tests such as distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic) enjoy solid theoretical and empirical properties. In this paper, we prove that independence tests achieve universally consistent k-sample testing and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD) are precisely equivalent to Dcorr. An empirical evaluation of nonparametric independence tests showed that they generally perform better than the popular MANOVA test, even in Gaussian distributed scenarios. The evaluation included several popular independence statistics and covered a comprehensive set of simulations. Additionally, the testing approach was extended to perform multiway and multilevel tests, which were demonstrated in a simulated study as well as a real-world fMRI brain scans with a set of attributes.
翻译:k样本检验问题涉及判断$k$组数据点是否来自同一分布。生物医学中k样本检验的标准方法是多元方差分析(MANOVA),尽管该方法依赖于较强且通常不合适的参数假设。此外,独立性检验与k样本检验密切相关,诸如距离相关(Dcorr)和希尔伯特-施密特独立性准则(Hsic)等几种普遍一致的高维独立性检验具有良好的理论和实证特性。本文证明,独立性检验可实现普遍一致的k样本检验,且k样本统计量(如能量和最大均值差异(MMD))与Dcorr精确等价。对非参数独立性检验的实证评估表明,即使在高斯分布场景下,这些检验通常也比流行的MANOVA检验表现更优。评估涵盖了多种主流独立性统计量,并涉及全面的模拟场景。此外,本文还将检验方法扩展到多路和多层检验,并通过模拟研究以及一组真实世界fMRI脑扫描数据(包含多个属性)进行了验证。