We provide a rigorous random matrix theory analysis of spiked cross-covariance models where the signals across two high-dimensional data channels are partially aligned. These models are motivated by multi-modal learning and form the standard generative setting underlying Partial Least Squares (PLS), a widely used yet theoretically underdeveloped method. We show that the leading singular values of the sample cross-covariance matrix undergo a Baik-Ben Arous-Peche (BBP)-type phase transition, and we characterize the precise thresholds for the emergence of informative components. Our results yield the first sharp asymptotic description of the signal recovery capabilities of PLS in this setting, revealing a fundamental performance gap between PLS and the Bayes-optimal estimator. In particular, we identify the SNR and correlation regimes where PLS fails to recover any signal, despite detectability being possible in principle. These findings clarify the theoretical limits of PLS and provide guidance for the design of reliable multi-modal inference methods in high dimensions.
翻译:我们对两个高维数据通道间信号部分对齐的尖峰互协方差模型进行了严格的随机矩阵理论分析。这些模型受多模态学习启发,构成了偏最小二乘(PLS)这一广泛使用但理论发展尚不完善方法的标准生成框架。我们证明了样本互协方差矩阵的前导奇异值会经历Baik-Ben Arous-Peche(BBP)型相变,并精确刻画了信息成分出现的阈值。我们的研究首次在该设定下给出了PLS信号恢复能力的尖锐渐近描述,揭示了PLS与贝叶斯最优估计器之间的基本性能差距。特别地,我们识别了信噪比与相关机制中PLS无法恢复任何信号的区域,尽管原则上可检测性仍然存在。这些发现阐明了PLS的理论极限,并为高维场景下可靠多模态推理方法的设计提供了指导。