Whitening loss provides theoretical guarantee in avoiding feature collapse for self-supervised learning (SSL) using joint embedding architectures. One typical implementation of whitening loss is hard whitening that designs whitening transformation over embedding and imposes the loss on the whitened output. In this paper, we propose spectral transformation (ST) framework to map the spectrum of embedding to a desired distribution during forward pass, and to modulate the spectrum of embedding by implicit gradient update during backward pass. We show that whitening transformation is a special instance of ST by definition, and there exist other instances that can avoid collapse by our empirical investigation. Furthermore, we propose a new instance of ST, called IterNorm with trace loss (INTL). We theoretically prove that INTL can avoid collapse and modulate the spectrum of embedding towards an equal-eigenvalue distribution during the course of optimization. Moreover, INTL achieves 76.6% top-1 accuracy in linear evaluation on ImageNet using ResNet-50, which exceeds the performance of the supervised baseline, and this result is obtained by using a batch size of only 256. Comprehensive experiments show that INTL is a promising SSL method in practice. The code is available at https://github.com/winci-ai/intl.
翻译:白化损失为使用联合嵌入架构的自监督学习避免特征坍缩提供了理论保证。白化损失的一种典型实现是硬白化,该方法对嵌入设计白化变换,并对白化输出施加损失函数。本文提出频谱变换框架,在前向传播中将嵌入的频谱映射至目标分布,并在反向传播中通过隐式梯度更新对嵌入频谱进行调制。我们证明白化变换本质上是ST的特殊实例,并通过实证研究发现存在其他可避免坍缩的实例。进一步地,我们提出ST的新实例——基于迹损失的迭代归一化方法。理论证明INTL能够避免特征坍缩,并在优化过程中将嵌入频谱调制为等特征值分布。此外,INTL使用ResNet-50在ImageNet线性评估中达到76.6%的top-1准确率,超越有监督基线,且该结果仅需batch size为256。综合实验表明INTL是一种实用的高效自监督学习方法。代码开源在https://github.com/winci-ai/intl。