Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models. Driven by advances in pre-trained language models and contrastive learning techniques, leading sentence representation methods can already achieved average Spearman's correlation scores of approximately 86 across seven STS benchmarks in SentEval. However, further improvements have become increasingly marginal, with no existing method attaining an average score higher than 87 on these tasks. This paper conducts an in-depth analysis of this phenomenon and concludes that the upper limit for Spearman's correlation scores using contrastive learning is 87.5. To transcend this ceiling, we propose an innovative approach termed Pcc-tuning, which employs Pearson's correlation coefficient as a loss function to refine model performance beyond contrastive learning. Experimental results demonstrate that Pcc-tuning markedly surpasses previous state-of-the-art strategies, raising the Spearman's correlation score to above 90.
翻译:语义文本相似性(STS)是计算语言学领域的关键研究方向,也是衡量嵌入模型编码能力的重要指标。在预训练语言模型与对比学习技术进步的推动下,领先的句子表示方法在SentEval的七个STS基准测试中已能实现约86的平均斯皮尔曼相关系数。然而,后续改进的边际效益日益递减,现有方法在这些任务中均未能取得高于87的平均得分。本文深入分析了该现象,并论证得出基于对比学习的斯皮尔曼相关系数上限为87.5。为突破此性能瓶颈,我们提出名为Pcc-tuning的创新方法,该方法采用皮尔逊相关系数作为损失函数,在对比学习基础上进一步优化模型性能。实验结果表明,Pcc-tuning显著超越了现有最优策略,将斯皮尔曼相关系数提升至90以上。