We introduce Variational Joint Embedding (VJE), a framework that synthesizes joint embedding and variational inference to enable self-supervised learning of probabilistic representations in a reconstruction-free, non-contrastive setting. Compared to energy-based predictive objectives that optimize pointwise discrepancies, VJE maximizes a symmetric conditional evidence lower bound (ELBO) for a latent-variable model defined directly on encoder embeddings. We instantiate the conditional likelihood with a heavy-tailed Student-$t$ model using a polar decomposition that explicitly decouples directional and radial factors to prevent norm-induced instabilities during training. VJE employs an amortized inference network to parameterize a diagonal Gaussian variational posterior whose feature-wise variances are shared with the likelihood scale to capture anisotropic uncertainty without auxiliary projection heads. Across ImageNet-1K, CIFAR-10/100, and STL-10, VJE achieves performance comparable to standard non-contrastive baselines under linear and k-NN evaluation. We further validate these probabilistic semantics through one-class CIFAR-10 anomaly detection, where likelihood-based scoring under the proposed model outperforms comparable self-supervised baselines.
翻译:本文提出变分联合嵌入(VJE)框架,该框架通过融合联合嵌入与变分推断,能够在免重建、非对比的设置下实现概率表征的自监督学习。相较于优化逐点差异的基于能量的预测目标,VJE直接针对编码器嵌入定义的隐变量模型,最大化对称条件证据下界(ELBO)。我们采用重尾Student-$t$分布实例化条件似然,并利用极坐标分解显式解耦方向因子与径向因子,从而避免训练过程中由范数引起的失稳。VJE采用摊销推断网络参数化对角高斯变分后验,其逐特征方差与似然尺度共享,无需辅助投影头即可捕获各向异性不确定性。在ImageNet-1K、CIFAR-10/100和STL-10数据集上,VJE在线性评估与k近邻评估中取得了与标准非对比基线相当的性能。我们进一步通过CIFAR-10单类异常检测验证了这些概率语义,在该任务中,基于所提模型的似然评分优于可比较的自监督基线方法。