In this paper, we conduct a comprehensive analysis of two dual-branch (Siamese architecture) self-supervised learning approaches, namely Barlow Twins and spectral contrastive learning, through the lens of matrix mutual information. We prove that the loss functions of these methods implicitly optimize both matrix mutual information and matrix joint entropy. This insight prompts us to further explore the category of single-branch algorithms, specifically MAE and U-MAE, for which mutual information and joint entropy become the entropy. Building on this intuition, we introduce the Matrix Variational Masked Auto-Encoder (M-MAE), a novel method that leverages the matrix-based estimation of entropy as a regularizer and subsumes U-MAE as a special case. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.
翻译:本文通过矩阵互信息的视角,对两种双分支(孪生架构)自监督学习方法——Barlow Twins与谱对比学习——进行了全面分析。我们证明了这些方法的损失函数隐式地同时优化了矩阵互信息与矩阵联合熵。这一洞见促使我们进一步探索单分支算法类别,特别是MAE与U-MAE,对于此类方法,互信息与联合熵退化为熵。基于此直觉,我们提出了矩阵变分掩码自编码器(M-MAE),这是一种利用基于矩阵的熵估计作为正则化项的新方法,并将U-MAE涵盖为其特例。实证评估突显了M-MAE相较于包括最先进方法在内的其他方法的有效性,具体表现在ImageNet数据集上,ViT-Base的线性探测准确率提升了3.9%,ViT-Large的微调准确率提升了1%。