The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime, achieving 72.3% on the GSM8K dataset by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.
翻译:最大熵编码框架为非对比学习方法(如SimSiam、Barlow Twins与MEC)提供了统一视角。受此框架启发,我们提出Matrix-SSL——一种利用矩阵信息论将最大熵编码损失解释为矩阵均匀性损失的新方法。进一步地,Matrix-SSL通过无缝融入矩阵对齐损失,直接对齐不同分支间的协方差矩阵,从而增强最大熵编码方法。实验结果表明,在ImageNet数据集线性评估设置下及MS-COCO迁移学习任务中,Matrix-SSL均优于当前最优方法。具体而言,在MS-COCO迁移学习任务中,本方法仅需400轮预训练(相比800轮)即可超越此前最优的MoCo v2与BYOL方法,性能提升高达3.3%。我们还将表征学习引入语言建模领域,通过矩阵交叉熵损失微调7B模型,在GSM8K数据集上达到72.3%的准确率,相较标准交叉熵损失提升3.1%。代码已开源至https://github.com/yifanzhang-pro/Matrix-SSL。