The behavior of the leading singular values and vectors of noisy low-rank matrices is fundamental to many statistical and scientific problems. Theoretical understanding currently derives from asymptotic analysis under one of two regimes: (1) the classical regime, with a fixed number of rows and large number of columns, or vice versa, and (2) the proportional regime, with large numbers of rows and columns, proportional to one another. This paper is concerned with the disproportional regime, where the matrix is either ``tall and narrow'' or ``short and wide'': we study sequences of matrices of size $n \times m_n$ with aspect ratio $ n/m_n \rightarrow 0$ or $n/m_n \rightarrow \infty$ as $n \rightarrow \infty$. This regime has important ``big data'' applications. Theory derived here shows that the displacement of the empirical singular values and vectors from their noise-free counterparts and the associated phase transitions -- well-known under proportional growth asymptotics -- still occur in the disproportionate setting. They must be quantified, however, on a novel scale of measurement that adjusts with the changing aspect ratio as the matrix size increases. In this setting, the top singular vectors corresponding to the longer of the two matrix dimensions are asymptotically uncorrelated with the noise-free signal.
翻译:噪声低秩矩阵的主奇异值与奇异向量的行为是众多统计与科学问题的核心。当前的理论理解源于两种渐进分析框架:(1)经典框架——固定行数且列数趋于无穷(或反之);(2)比例框架——行列数均趋于无穷且保持固定比例。本文关注非比例框架,即矩阵呈现“高瘦型”或“矮胖型”特征:我们研究规模为$n \times m_n$的矩阵序列,其长宽比满足$n/m_n \rightarrow 0$或$n/m_n \rightarrow \infty$(当$n \rightarrow \infty$时)。该框架在“大数据”应用中具有重要意义。本文推导的理论表明,经验奇异值与奇异向量相对于无噪声版本的偏移及其相变现象——在比例增长渐近理论中已广为人知——在非比例设置中仍会发生。然而,这些现象必须基于一种新型度量尺度进行量化,该尺度随矩阵规模扩大而动态调整长宽比变化。在此设置下,对应矩阵较长维度的顶部奇异向量与无噪声信号呈渐近不相关特性。