We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M={\rm o}(\sqrt{\ln N})$. Allowing for an $N$-dependent rank offers new challenges and requires new methods. Working in the Bayes-optimal setting, we show that whenever the signal has i.i.d.~entries, the limiting mutual information between signal and data is given by a variational formula involving a rank-one replica symmetric potential. In other words, from the information-theoretic perspective, the case of a (slowly) growing rank is the same as when $M=1$ (namely, the standard spiked Wigner model). The proof is primarily based on a novel multiscale cavity method allowing for growing rank along with some information-theoretic identities on worst noise for the vector Gaussian channel. We believe that the cavity method developed here will play a role in the analysis of a broader class of inference and spin models where the degrees of freedom are large arrays instead of vectors.
翻译:我们考虑一个对称矩阵分解的统计模型,该模型在加性高斯噪声下工作于高维区域,其中待推断信号矩阵的秩$M$与其尺寸$N$满足标度关系$M={\rm o}(\sqrt{\ln N})$。允许秩依赖于$N$带来了新的挑战,并需要新的方法。在贝叶斯最优设置下,我们证明只要信号具有独立同分布(i.i.d.)的条目,信号与数据之间的极限互信息可由一个涉及秩一复本对称势的变分公式给出。换言之,从信息论的角度来看,(缓慢)增长的秩的情况与$M=1$时(即标准的尖峰Wigner模型)相同。证明主要基于一种新颖的多尺度空腔方法,该方法允许秩增长,并结合了向量高斯信道在最差噪声下的一些信息论恒等式。我们相信,这里发展的空腔方法将在分析更广泛的推断和自旋模型类别中发挥作用,这些模型的自由度是大型数组而非向量。