Bayes-optimal limits in structured PCA, and how to reach them

How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide the first characterization of the Bayes-optimal limits of inference in this model. If the spike is rotation-invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical mechanics. We thus propose a novel AMP, inspired by the theory of Adaptive Thouless-Anderson-Palmer equations, which saturates the theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at remarkable universality properties.

翻译：测量噪声中的统计依赖性如何影响高维推断？为解答这一问题，我们研究了主成分分析（PCA）中经典的尖峰矩阵模型——即秩一矩阵被加性噪声破坏的情形。我们摒弃了噪声条目间通常采用的独立性假设，转而从低阶多项式正交矩阵系综中生成噪声。由此产生的噪声相关性虽使该情境具有实际应用价值，却带来了分析上的挑战。我们首次刻画了该模型下推断的贝叶斯最优极限。研究表明，当尖峰具有旋转不变性时，标准谱PCA即为最优算法。然而对于更一般的先验分布，无论是PCA还是现有近似消息传递算法（AMP）均无法达到信息论极限——后者已由我们通过统计力学中的副本方法计算得出。据此，我们提出一种受自适应Thouless-Anderson-Palmer方程理论启发的新型AMP，该算法能够饱和理论极限，并配有严格的态演化分析以跟踪其性能。尽管本文聚焦特定噪声分布，但所提方法论可推广至更广泛的迹矩阵系综，仅需处理更复杂的表达式。最后值得关注的是，尽管旋转不变噪声看似苛刻，但我们的理论经验性地预测了实际数据上的算法性能，揭示了显著的普适性特征。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日