Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.
翻译:BYOL和SimSiam等非对比自监督学习方法依赖非对称预测器网络来避免无负样本情况下的表征坍缩。然而,预测器网络如何促进稳定学习尚未得到充分理解。尽管先前的理论分析采用欧几里得损失,但多数实际实现基于余弦相似度。为深入理解非对比自监督学习的理论机制,我们在闭合形式线性预测器网络的特征空间内分析欧几里得损失与余弦相似度联合作用下的学习动力学。研究表明,两种损失虽通过不同动力学机制,但均通过隐式方差正则化避免坍缩。此外,我们发现特征值作为有效学习率倍增因子,并据此提出各向同性损失函数族(IsoLoss),该损失函数可均衡不同本征模态的收敛速度。实验表明,IsoLoss能加速初始学习动力学并提升鲁棒性,从而可舍弃非对比方法中常用的指数移动平均(EMA)目标网络。我们的分析揭示了非对比自监督学习的方差正则化机制,为构建能够塑造预测器谱学习动力学的新型损失函数奠定了理论基础。