Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.
翻译:自监督学习是从无标签数据中学习表示的重要范式,基于神经网络的自监督学习在实践中取得了巨大成功。然而,当前自监督学习的理论分析大多局限于泛化误差界。相比之下,学习动力学通常能为基于神经网络的模型行为提供精确刻画,但至今主要在监督设置中有所研究。本文研究了自监督模型的学习动力学,特别是通过最小化对比损失与非对比损失所获得的表示。我们证明,将多元回归动力学直接扩展到自监督学习会导致学习得到平凡的标量表示,从而引发自监督学习中的维度塌缩。因此,我们在权重上引入正交约束来构建自监督学习目标,并在格拉斯曼流形上推导了基于梯度下降训练的自监督模型的精确(与网络宽度无关)学习动力学。我们还论证了自监督模型的无限宽度近似与监督模型的神经正切核近似存在显著差异。我们通过数值实验验证了理论发现的有效性,并讨论了所呈现结果如何为进一步分析对比与非对比自监督学习提供理论框架。