An important class of two-sample multivariate homogeneity tests is based on identifying differences between the distributions of interpoint distances. While generating distances from point clouds offers a straightforward and intuitive way for dimensionality reduction, it also introduces dependencies to the resulting distance samples. We propose a simple test based on Wilcoxon's rank sum statistic for which we prove asymptotic normality under the null hypothesis and fixed alternatives under mild conditions on the underlying distributions of the point clouds. Furthermore, we show consistency of the test and derive a variance approximation that allows to construct a computationally feasible, distribution-free test with good finite sample performance. The power and robustness of the test for high-dimensional data and low sample sizes is demonstrated by numerical simulations. Finally, we apply the proposed test to case-control testing on microarray data in genetic studies, which is considered a notorious case for a high number of variables and low sample sizes.
翻译:基于识别点间距离分布差异的检验构成了双样本多元同质性检验的一个重要类别。虽然从点云生成距离为降维提供了一种直观直接的方法,但同时也引入了对所得距离样本的依赖性。我们提出了一种基于Wilcoxon秩和统计量的简单检验,证明了在原假设及固定备择假设下,当点云的基础分布满足温和条件时,该检验统计量具有渐近正态性。此外,我们证明了检验的一致性,并推导出方差近似公式,从而能够构建计算可行、无分布依赖且具有良好有限样本性能的检验方法。数值模拟验证了该检验在高维数据和小样本量情况下的功效与稳健性。最后,我们将所提出的检验应用于遗传学研究中微阵列数据的病例-对照检验——这类问题因变量维度极高而样本量极少而 notoriously 困难。