In this paper, we propose a new test for testing the equality of two population covariance matrices in the ultra-high dimensional setting that the dimension is much larger than the sizes of both of the two samples. Our proposed methodology relies on a data splitting procedure and a comparison of a set of well selected eigenvalues of the sample covariance matrices on the split data sets. Compared to the existing methods, our methodology is adaptive in the sense that (i). it does not require specific assumption (e.g., comparable or balancing, etc.) on the sizes of two samples; (ii). it does not need quantitative or structural assumptions of the population covariance matrices; (iii). it does not need the parametric distributions or the detailed knowledge of the moments of the two populations. Theoretically, we establish the asymptotic distributions of the statistics used in our method and conduct the power analysis. We justify that our method is powerful under very weak alternatives. We conduct extensive numerical simulations and show that our method significantly outperforms the existing ones both in terms of size and power. Analysis of two real data sets is also carried out to demonstrate the usefulness and superior performance of our proposed methodology. An $\texttt{R}$ package $\texttt{UHDtst}$ is developed for easy implementation of our proposed methodology.
翻译:本文提出了一种在超高维场景下检验两个总体协方差矩阵相等性的新方法,其中维度远大于两个样本的容量。该方法基于数据拆分策略,并比较拆分数据集上样本协方差矩阵的一组精心选取的特征值。与现有方法相比,我们的方法具有自适应性:(i) 无需对两个样本容量做出特定假设(如可比性或平衡性等);(ii) 无需对总体协方差矩阵施加定量或结构性假设;(iii) 无需参数分布或两总体矩的详细知识。在理论上,我们建立了方法中所用统计量的渐近分布,并进行了功效分析,证明该方法在极弱备择假设下仍具有较高检验功效。通过大量数值模拟,我们证明该方法在检验水平和功效方面均显著优于现有方法。此外,通过分析两个真实数据集,验证了所提方法的实用性和优越性能。为便于实施,我们开发了R包UHDtst。