Sentence embeddings are crucial in measuring semantic similarity. Most recent studies employed large language models (LLMs) to learn sentence embeddings. Existing LLMs mainly adopted autoregressive architecture without explicit backward dependency modeling. Therefore, we examined the effects of backward dependencies in LLMs for semantic similarity measurements. Concretely, we propose a novel model: backward dependency enhanced large language model (BeLLM). It learns sentence embeddings via transforming specific attention layers from uni- to bi-directional. We extensively experiment across various semantic textual similarity (STS) tasks and downstream applications. BeLLM achieves state-of-the-art performance in varying scenarios. It shows that auto-regressive LLMs benefit from backward dependencies for sentence embeddings.
翻译:句子嵌入在度量语义相似度中至关重要。近期研究多采用大语言模型(LLMs)学习句子嵌入,但现有LLMs主要采用自回归架构,缺乏显式的后向依赖建模。为此,我们研究了LLMs中后向依赖对语义相似度度量的影响。具体而言,我们提出一种新型模型:后向依赖增强的大语言模型(BeLLM),其通过将特定注意力层从单向转换为双向来学习句子嵌入。我们在多种语义文本相似度(STS)任务及下游应用中进行了广泛实验。BeLLM在不同场景下均取得了最先进的性能表现,结果表明自回归LLMs能够从后向依赖中获益以提升句子嵌入效果。