The Joint-Embedding Predictive Architecture (JEPA) is often seen as a non-generative alternative to likelihood-based self-supervised learning, emphasizing prediction in representation space rather than reconstruction in observation space. We argue that the resulting separation from probabilistic generative modeling is largely rhetorical rather than structural: the canonical JEPA design, coupled encoders with a context-to-target predictor, mirrors the variational posteriors and learned conditional priors obtained when variational inference is applied to a particular class of coupled latent-variable models, and standard JEPA can be viewed as a deterministic specialization in which regularization is imposed via architectural and training heuristics rather than an explicit likelihood. Building on this view, we derive the Variational JEPA (Var-JEPA), which makes the latent generative structure explicit by optimizing a single Evidence Lower Bound (ELBO). This yields meaningful representations without ad-hoc anti-collapse regularizers and allows principled uncertainty quantification in the latent space. We instantiate the framework for tabular data (Var-T-JEPA) and achieve strong representation learning and downstream performance, consistently improving over T-JEPA while remaining competitive with strong raw-feature baselines.
翻译:联合嵌入预测架构(JEPA)常被视为基于似然的自我监督学习中一种非生成式替代方案,强调在表示空间而非观测空间中进行预测。我们认为,由此产生的与概率生成式建模的分离主要源于修辞而非结构设计:经典的JEPA设计(耦合编码器与上下文-目标预测器)反映了将变分推断应用于特定类别耦合隐变量模型时所获得的变分后验和学习条件先验。标准JEPA可被视为确定性特例——正则化通过架构设计及训练启发式方法实现,而非显式似然。基于此观点,我们推导出变分JEPA(Var-JEPA),通过优化单一证据下界(ELBO)显式化隐式生成结构。该方法无需反坍塌启发式正则化即可产生有意义的表示,并能在潜空间中进行原理性不确定性量化。我们针对表格数据实例化该框架(Var-T-JEPA),实现了强大的表示学习与下游任务性能,在持续优于T-JEPA的同时,与强原始特征基线保持竞争力。