We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, kernel PCA may serve as a useful model of self-supervised learning.
翻译:我们提出了一个关于联合嵌入自监督学习方法训练过程的简明图景。研究发现,这些方法在离散且明显分离的序列步骤中,每次仅学习高维嵌入的一个维度。这一结论源于对Barlow Twins线性化模型的研究,该模型适用于训练网络为无限宽的情形。我们从微小初始化出发求解该模型的训练动力学,发现其逐步学习特定对比核的顶部本征模态,并获得了最终学习表示的闭式表达式。值得注意的是,在使用Barlow Twins、SimCLR和VICReg损失训练深度残差网络时,同样观察到了这一逐步学习现象。我们的理论表明,正如核回归可作为监督学习的模型,核主成分分析或可作为自监督学习的有效模型。