Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies the assumptions behind established SSL methods and provides principled guidance for developing new approaches.
翻译:自监督学习(SSL)擅长从复杂数据中发现通用潜在表示,但缺乏统一的理论框架来解释现有各类方法并指导新方法的设计。本文将SSL视为潜在分布匹配(LDM):学习使假设潜在模型下的对数概率最大化的表示(对齐),同时最大化潜在熵以防止坍塌(均匀性)。这一观点将独立成分分析与对比式、非对比式和预测式SSL方法(包括停止梯度方法)统一起来。借助LDM,我们推导出一种基于卡尔曼预测器的非线性无采样贝叶斯滤波模型,适用于高维时间序列。我们进一步证明,在温和假设下,即使使用非线性预测器,预测式LDM也能产生可识别的潜在表示。总之,LDM阐明了现有SSL方法背后的假设,并为开发新方法提供了基于原理的指导。