Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, many studies have investigated the similarities between contrastive and non-contrastive SSL from a theoretical perspective. Such similarities can be verified in classification tasks, where the two approaches achieve comparable performance. But in ranking tasks (i.e., Semantic Textual Similarity (STS) in SRL), contrastive SSL significantly outperforms non-contrastive SSL. Therefore, two questions arise: First, *what commonalities enable various contrastive losses to achieve superior performance in STS?* Second, *how can we make non-contrastive SSL also effective in STS?* To address these questions, we start from the perspective of gradients and discover that four effective contrastive losses can be integrated into a unified paradigm, which depends on three components: the **Gradient Dissipation**, the **Weight**, and the **Ratio**. Then, we conduct an in-depth analysis of the roles these components play in optimization and experimentally demonstrate their significance for model performance. Finally, by adjusting these components, we enable non-contrastive SSL to achieve outstanding performance in STS.
翻译:句子表示学习(SRL)是自然语言处理(NLP)中的关键任务,其中对比自监督学习(SSL)是目前的主流方法。然而,其卓越有效性的内在原因尚不明确。具体而言,许多研究从理论角度探讨了对比与非对比SSL之间的相似性。这种相似性在分类任务中已得到验证——两种方法可取得相当的性能。但在排序任务(即SRL中的语义文本相似度(STS))中,对比SSL显著优于非对比SSL。由此产生两个问题:第一,*各种对比损失函数在STS中取得优异表现的共性是什么?*第二,*如何使非对比SSL在STS中同样有效?*为解决这些问题,我们从梯度视角出发,发现四种有效的对比损失可被整合为一个统一范式,该范式依赖于三个组件:**梯度消散**、**权重**和**比率**。随后,我们深入分析了这些组件在优化过程中的作用,并通过实验证明了它们对模型性能的重要性。最后,通过调整这些组件,我们使非对比SSL在STS中实现了卓越性能。