Uniform-over-dimension convergence with application to location tests for high-dimensional data

Asymptotic methods for hypothesis testing in high-dimensional data usually require the dimension of the observations to increase to infinity, often with an additional condition on its rate of increase compared to the sample size. On the other hand, multivariate asymptotic methods are valid for fixed dimension only, and their practical implementations in hypothesis testing methodology typically require the sample size to be large compared to the dimension for yielding desirable results. However, in practical scenarios, it is usually not possible to determine whether the dimension of the data at hand conform to the conditions required for the validity of the high-dimensional asymptotic methods, or whether the sample size is large enough compared to the dimension of the data. In this work, a theory of asymptotic convergence is proposed, which holds uniformly over the dimension of the random vectors. This theory attempts to unify the asymptotic results for fixed-dimensional multivariate data and high-dimensional data, and accounts for the effect of the dimension of the data on the performance of the hypothesis testing procedures. The methodology developed based on this asymptotic theory can be applied to data of any dimension. An application of this theory is demonstrated in the two-sample test for the equality of locations. The test statistic proposed is unscaled by the sample covariance, similar to usual tests for high-dimensional data. Using simulated examples, it is demonstrated that the proposed test exhibits better performance compared to several popular tests in the literature for high-dimensional data. Further, it is demonstrated in simulated models that the proposed unscaled test performs better than the usual scaled two-sample tests for multivariate data, including the Hotelling's $T^2$ test for multivariate Gaussian data.

翻译：高维数据假设检验的渐近方法通常要求观测数据的维度趋于无穷大，且常附加维度相对于样本量的增长率条件。另一方面，多元渐近方法仅适用于固定维度，其在实际假设检验方法中的应用通常需要样本量远大于维度才能获得理想结果。然而在实际场景中，我们通常无法判断当前数据的维度是否符合高维渐近方法的有效性条件，也无法确定样本量相对于数据维度是否足够大。本文提出一种渐近收敛理论，该理论在随机向量维度上保持一致收敛性。该理论试图统一固定维度多元数据与高维数据的渐近结果，并解释数据维度对假设检验程序性能的影响。基于该渐近理论开发的方法可适用于任意维度的数据。本文在位置相等的双样本检验中展示了该理论的应用。与常见的高维数据检验类似，所提出的检验统计量无需样本协方差矩阵进行缩放。通过模拟示例证明，与现有文献中几种流行的高维数据检验相比，所提检验表现出更优性能。此外，模拟模型显示，对于包括多元高斯数据的Hotelling $T^2$检验在内的多元数据常规缩放双样本检验，所提出的非缩放检验表现更优。