Distribution-free tests such as the Wilcoxon rank sum test are popular for testing the equality of two univariate distributions. Among the important reasons for their popularity are the striking results of Hodges-Lehmann (1956) and Chernoff-Savage (1958), where the authors show that the asymptotic (Pitman) relative efficiency of Wilcoxon's test with respect to Student's $t$-test, under location-shift alternatives, never falls below $0.864$ (with the identity score) and $1$ (with the Gaussian score) respectively, despite the former being exactly distribution-free for all sample sizes. Motivated by these results, we propose and study a large family of exactly distribution-free multivariate rank-based two-sample tests by leveraging the theory of optimal transport. First, we propose distribution-free analogs of the Hotelling $T^2$ test (the natural multidimensional counterpart of Student's $t$-test) and show that they satisfy Hodges-Lehmann and Chernoff-Savage-type efficiency lower bounds over natural sub-families of multivariate distributions, despite being entirely agnostic to the underlying data generating mechanism -- making them the first multivariate, nonparametric, exactly distribution-free tests that provably achieve such efficiency lower bounds. As these tests are derived from Hotelling $T^2$, naturally they are not universally consistent (same as Wilcoxon's test). To overcome this, we propose exactly distribution-free versions of the celebrated kernel maximum mean discrepancy test and the energy test. These tests are indeed universally consistent under no moment assumptions, exactly distribution-free for all sample sizes, and have non-trivial Pitman efficiency. We believe this trifecta of properties hasn't yet been proven for any existing test in the literature.
翻译:Wilcoxon秩和检验等无分布检验常用于检验两个单变量分布是否相等。其广泛流行的重要原因来自Hodges-Lehmann (1956)与Chernoff-Savage (1958)的标志性结果:在位置偏移备择假设下,Wilcoxon检验相对于Student $t$检验的渐近(Pitman)相对效率分别不低于$0.864$(采用恒等得分)和$1$(采用高斯得分),尽管前者对所有样本量均严格具有分布无关性。受此启发,我们借助最优输运理论提出并研究了一类大规模严格分布无关的多变量秩基两样本检验族。首先,我们提出Hotelling $T^2$检验(即Student $t$检验的多元自然对应)的分布无关版本,并证明其在多变量分布的特定自然子族上满足Hodges-Lehmann型与Chernoff-Savage型效率下界——尽管这些检验完全脱离底层数据生成机制——使其成为首个可证明达到此类效率下界的多变量非参数严格分布无关检验。由于这些检验源自Hotelling $T^2$,它们天然不具备全局一致性(与Wilcoxon检验相同)。为解决这一问题,我们提出著名的核最大均值差异检验与能量检验的严格分布无关版本。这些检验在无矩假设条件下确实具有全局一致性,对所有样本量严格保持分布无关性,并具备非平凡的Pitman效率。我们相信,这一性质三重性尚未被现有文献中的任何检验所证实。