Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual subsampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual matrix after extracting the spiked components. The test statistic converges in distribution to the standard normal under the null hypothesis, and diverges to infinity with asymptotic probability one under the alternative hypothesis. The effectiveness of RIRS procedure is justified theoretically, utilizing the asymptotic expansions of eigenvectors and eigenvalues for large random matrices recently developed in [11] and [12]. The advantages of the newly suggested procedure are demonstrated through several simulation and real data examples.
翻译:在利用低秩加噪声模型处理矩阵数据的众多大规模应用中,确定精确的秩是一个重要问题。本文提出一种通过残差子抽样(RIRS)进行秩推断的通用方法,适用于包括许多常用网络模型(如度校正混合成员模型作为特例)在内的广泛模型族中的秩检验与估计。该方法通过提取尖峰成分后,对残差矩阵元素进行子抽样来构建检验统计量。该统计量在原假设下依分布收敛于标准正态分布,在备择假设下以渐近概率1发散至无穷大。RIRS方法的有效性在理论上得到了证明,其利用了近期文献[11]和[12]中发展的大规模随机矩阵特征向量与特征值的渐近展开理论。通过多个模拟和实际数据案例,验证了新提出方法的优越性。