In this paper, we provide a comprehensive toolbox for understanding and enhancing self-supervised learning (SSL) methods through the lens of matrix information theory. Specifically, by leveraging the principles of matrix mutual information and joint entropy, we offer a unified analysis for both contrastive and feature decorrelation based methods. Furthermore, we propose the matrix variational masked auto-encoder (M-MAE) method, grounded in matrix information theory, as an enhancement to masked image modeling. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.
翻译:本文通过矩阵信息理论的视角,为理解和增强自监督学习方法提供了一套全面的分析工具。具体而言,我们基于矩阵互信息与联合熵原理,对基于对比学习和特征解相关的方法进行了统一分析。此外,我们提出了一种基于矩阵信息理论的矩阵变分掩码自编码器(M-MAE)方法,作为掩码图像建模的增强方案。实验结果表明,与当前最先进方法相比,M-MAE在ImageNet数据集上实现了显著效果提升:线性探测ViT-Base模型提升3.9%,微调ViT-Large模型提升1%。