Self-supervised learning (SSL) is an increasingly popular paradigm for representation learning. Recent methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While dimension-contrastive methods converge to similar solutions as sample-contrastive methods, it can be empirically shown that some methods require more epochs of training to converge. Motivated by closing this divide, we present the objective function FroSSL which is both sample- and dimension-contrastive up to embedding normalization. FroSSL works by minimizing covariance Frobenius norms for avoiding collapse and minimizing mean-squared error for augmentation invariance. We show that FroSSL converges more quickly than a variety of other SSL methods and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on the CIFAR-10, CIFAR-100, STL-10, and ImageNet datasets.
翻译:自监督学习(SSL)是一种日益流行的表示学习范式。近期方法可分类为样本对比型、维度对比型或非对称网络型,各家族均有其避免信息坍塌的策略。尽管维度对比方法与样本对比方法收敛至相似解,但实证表明部分方法需要更多训练轮次才能收敛。为弥合这一差异,我们提出目标函数FroSSL,它在嵌入标准化后兼具样本对比与维度对比特性。FroSSL通过最小化协方差Frobenius范数避免坍塌,同时最小化均方误差实现增广不变性。实验表明,FroSSL的收敛速度优于多种SSL方法,且理论与实证证据均表明这种快速收敛源于FroSSL对嵌入协方差矩阵特征值的影响。此外,在CIFAR-10、CIFAR-100、STL-10和ImageNet数据集上使用ResNet18进行线性探测评估时,FroSSL学习到的表征具有竞争力。