We present a simple picture of the training process of self-supervised learning methods with joint embedding networks. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, \textit{kernel PCA} may serve as a useful model of self-supervised learning.
翻译:我们提出一个简洁的图像,用以描述采用联合嵌入网络的自监督学习方法的训练过程。研究发现,这些方法的高维嵌入并非同步学习,而是以一系列离散的、间隔明显的阶梯式步骤逐一学习各维度。这一结论源于对适用于无限宽网络情形的Barlow Twins线性化模型的研究。我们从初始微小量出发求解该模型的训练动力学,发现模型以阶梯式方式学习某个对比性核的顶端本征模,并获得了最终学习表示的闭式表达式。值得关注的是,当使用Barlow Twins、SimCLR和VICReg损失函数训练深度残差网络时,我们同样观察到了这一阶梯式学习现象。我们的理论表明,正如核回归可被视为监督学习的一个模型,\textit{核主成分分析}或许能成为自监督学习的一个有效模型。