In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--Péché (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods
翻译:在Transformer模型时代,掩码自监督学习已成为基础性的训练范式。掩码自监督学习的一个决定性特征是:训练过程会聚合多种掩码模式下的预测,从而产生一个联合的矩阵值预测器,而非单一向量值估计量。该对象编码了坐标之间如何相互条件化,并带来了新的分析挑战。我们在样本数量与环境维度成比例的正则化区域中,对掩码建模目标进行了精确的高维分析。我们的研究结果为泛化误差提供了显式表达式,并刻画了所学预测器的谱结构,揭示了掩码建模如何从数据中提取结构。对于尖峰协方差模型,我们证明联合预测器会经历Baik–Ben Arous–Péché(BBP)型相变,从而确定掩码自监督学习何时开始恢复潜在信号。最后,我们识别出掩码自监督学习可证明优于主成分分析的结构化区域,突显了自监督学习目标相较于经典无监督方法的潜在优势。