Whitening is a classical technique in unsupervised learning that can facilitate estimation tasks by standardizing data. An important application is the estimation of latent variable models via the decomposition of tensors built from high-order moments. In particular, whitening orthogonalizes the means of a spherical Gaussian mixture model (GMM), thereby making the corresponding moment tensor orthogonally decomposable, hence easier to decompose. However, in the large-dimensional regime (LDR) where data are high-dimensional and scarce, the standard whitening matrix built from the sample covariance becomes ineffective because the latter is spectrally distorted. Consequently, whitened means of a spherical GMM are no longer orthogonal. Using random matrix theory, we derive exact limits for their dot products, which are generally nonzero in the LDR. As our main contribution, we then construct a corrected whitening matrix that restores asymptotic orthogonality, allowing for performance gains in spherical GMM estimation.
翻译:白化是无监督学习中的经典技术,可通过数据标准化简化估计任务。其重要应用之一是通过分解基于高阶矩构建的张量来估计潜变量模型。特别地,白化操作能使球面高斯混合模型(GMM)的均值向量正交化,从而使对应的矩张量具备正交可分解性,进而更易于分解。然而,在高维稀缺数据的大维体系下,基于样本协方差矩阵构建的标准白化矩阵会因协方差矩阵存在谱失真而失效,导致球面GMM的白化均值不再保持正交性。借助随机矩阵理论,我们推导了其点积的精确极限,该极限在大维体系下通常非零。作为核心贡献,我们进一步构建了修正的白化矩阵以恢复渐近正交性,从而提升球面GMM的估计性能。