Self-supervised learning (SSL) is an increasingly popular paradigm for representation learning. Recent methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While dimension-contrastive methods converge to similar solutions as sample-contrastive methods, it can be empirically shown that some methods require more epochs of training to converge. Motivated by closing this divide, we present the objective function FroSSL which is both sample- and dimension-contrastive up to embedding normalization. FroSSL works by minimizing covariance Frobenius norms for avoiding collapse and minimizing mean-squared error for augmentation invariance. We show that FroSSL converges more quickly than a variety of other SSL methods and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on the CIFAR-10, CIFAR-100, STL-10, and ImageNet datasets.
翻译:自监督学习(SSL)是一种日益流行的表征学习范式。近年来,相关方法可分为样本对比类、维度对比类和非对称网络类,这三类方法各有其避免信息坍缩的策略。尽管维度对比类方法的收敛解与样本对比类方法相似,但实验表明,某些方法需要更多训练轮次才能收敛。为弥合这一差距,我们提出了目标函数FroSSL,该方法在嵌入归一化条件下兼具样本对比与维度对比特性。FroSSL通过最小化协方差矩阵的Frobenius范数来避免信息坍缩,并通过最小化均方误差实现数据增强不变性。实验表明,FroSSL的收敛速度优于多种其他SSL方法。我们从理论和实验两方面证明,这种快速收敛源于FroSSL对嵌入协方差矩阵特征值的影响。此外,在CIFAR-10、CIFAR-100、STL-10和ImageNet数据集上使用ResNet18进行线性探测评估时,FroSSL能够学习到具有竞争力的表征。