High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.
翻译:高质量文本嵌入对于提升语义文本相似度(STS)任务至关重要,而该任务是大型语言模型(LLM)应用中的核心组成部分。然而,现有文本嵌入模型普遍面临梯度消失问题,这主要源于其优化目标依赖余弦函数,而余弦函数存在饱和区域。为解决这一问题,本文提出了一种新颖的角度优化文本嵌入模型AnglE。AnglE的核心思想是在复数空间中引入角度优化。这种方法有效缓解了余弦函数饱和区域对梯度传播的阻碍作用及其对优化过程的不利影响。为构建全面的STS评估体系,我们在现有短文本STS数据集以及从GitHub Issues收集的新建长文本STS数据集上开展了实验。此外,我们考察了标注数据有限时特定领域STS场景,并探索了AnglE与LLM标注数据结合的工作机制。针对短文本STS、长文本STS及特定领域STS等多项任务进行了大量实验。结果表明,AnglE的性能优于忽略余弦饱和区域的当前最优(SOTA)STS模型。这些发现验证了AnglE生成高质量文本嵌入的能力,以及角度优化在STS任务中的实用价值。