Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, in other research fields, contrastive SSL shares similarities in both theory and practical performance with non-contrastive SSL (e.g., alignment & uniformity, Barlow Twins, and VICReg). However, in SRL, contrastive SSL outperforms non-contrastive SSL significantly. Therefore, two questions arise: First, what commonalities enable various contrastive losses to achieve superior performance in SRL? Second, how can we make non-contrastive SSL, which is similar to contrastive SSL but ineffective in SRL, effective? To address these questions, we start from the perspective of gradients and discover that four effective contrastive losses can be integrated into a unified paradigm, which depends on three components: the Gradient Dissipation, the Weight, and the Ratio. Then, we conduct an in-depth analysis of the roles these components play in optimization and experimentally demonstrate their significance for model performance. Finally, by adjusting these components, we enable non-contrastive SSL to achieve outstanding performance in SRL.
翻译:句表示学习(SRL)是自然语言处理(NLP)中的关键任务,其中对比自监督学习(SSL)是目前的主流方法。然而,其显著有效性的原因尚不明确。具体而言,在其他研究领域,对比自监督学习与非对比自监督学习(如对齐性与均匀性、Barlow Twins、VICReg)在理论和实践性能上具有相似性。但在句表示学习中,对比自监督学习显著优于非对比自监督学习。因此,产生两个问题:第一,各种对比损失在句表示学习中取得优越性能的共性是什么?第二,如何使与对比自监督学习相似但在句表示学习中无效的非对比自监督学习变得有效?为解决这些问题,我们从梯度视角出发,发现四种有效的对比损失可整合为一种统一范式,该范式依赖于三个组件:梯度消散、权重和比率。随后,我们深入分析这些组件在优化中的作用,并通过实验证明其对模型性能的重要性。最后,通过调整这些组件,我们使非对比自监督学习在句表示学习中取得了卓越性能。