How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide the first characterization of the Bayes-optimal limits of inference in this model. If the spike is rotation-invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical mechanics. We thus propose a novel AMP, inspired by the theory of Adaptive Thouless-Anderson-Palmer equations, which saturates the theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at remarkable universality properties.
翻译:测量噪声中的统计依赖性如何影响高维推断?为解答这一问题,我们研究了主成分分析(PCA)中经典的尖峰矩阵模型——即秩一矩阵被加性噪声破坏的情形。我们摒弃了噪声条目间通常采用的独立性假设,转而从低阶多项式正交矩阵系综中生成噪声。由此产生的噪声相关性虽使该情境具有实际应用价值,却带来了分析上的挑战。我们首次刻画了该模型下推断的贝叶斯最优极限。研究表明,当尖峰具有旋转不变性时,标准谱PCA即为最优算法。然而对于更一般的先验分布,无论是PCA还是现有近似消息传递算法(AMP)均无法达到信息论极限——后者已由我们通过统计力学中的副本方法计算得出。据此,我们提出一种受自适应Thouless-Anderson-Palmer方程理论启发的新型AMP,该算法能够饱和理论极限,并配有严格的态演化分析以跟踪其性能。尽管本文聚焦特定噪声分布,但所提方法论可推广至更广泛的迹矩阵系综,仅需处理更复杂的表达式。最后值得关注的是,尽管旋转不变噪声看似苛刻,但我们的理论经验性地预测了实际数据上的算法性能,揭示了显著的普适性特征。