We introduce Variational Joint Embedding (VJE), a reconstruction-free latent-variable framework for non-contrastive self-supervised learning in representation space. VJE maximizes a symmetric conditional evidence lower bound (ELBO) on paired encoder embeddings by defining a conditional likelihood directly on target representations, rather than optimizing a pointwise compatibility objective. The likelihood is instantiated as a heavy-tailed Student--\(t\) distribution on a polar representation of the target embedding, where a directional--radial decomposition separates angular agreement from magnitude consistency and mitigates norm-induced pathologies. The directional factor operates on the unit sphere, yielding a valid variational bound for the associated spherical subdensity model. An amortized inference network parameterizes a diagonal Gaussian posterior whose feature-wise variances are shared with the directional likelihood, yielding anisotropic uncertainty without auxiliary projection heads. Across ImageNet-1K, CIFAR-10/100, and STL-10, VJE is competitive with standard non-contrastive baselines under linear and \(k\)-NN evaluation, while providing probabilistic semantics directly in representation space for downstream uncertainty-aware applications. We validate these semantics through out-of-distribution detection, where representation-space likelihoods yield strong empirical performance. These results position the framework as a principled variational formulation of non-contrastive learning, in which structured feature-wise uncertainty is represented directly in the learned embedding space.
翻译:我们提出变分联合嵌入(VJE),一种无需重构的潜变量框架,用于表示空间中的非对比自监督学习。VJE通过在配对编码器嵌入上最大化对称条件证据下界(ELBO)实现,其条件似然直接定义在目标表示上,而非优化逐点兼容性目标。该似然采用目标嵌入的极坐标表示上的重尾学生-t分布,通过方向-径向分解将角度一致性与幅度一致性分离,并缓解范数引发的病理现象。方向因子作用于单位球面,为相关的球面子密度模型提供有效变分边界。一个摊销推理网络参数化对角高斯后验,其特征级方差与方向似然共享,从而无需辅助投影头即可获得各向异性不确定性。在ImageNet-1K、CIFAR-10/100和STL-10数据集上,VJE在线性评估和k-NN评估中与标准非对比基准方法表现相当,同时在表示空间中直接提供概率语义以支持下游不确定性感知应用。我们通过分布外检测验证了这些语义,其中表示空间似然表现出强大的经验性能。这些结果将该框架定位为非对比学习的原理性变分公式,其中结构化特征级不确定性直接在所学嵌入空间中表示。