Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference methods for sentence representation in a nascent stage. This paper endeavors to bridge this research gap. Through comprehensive experimentation, we challenge the widely held belief in the necessity of an Explicit One-word Limitation for deriving sentence embeddings from Pre-trained Language Models (PLMs). We demonstrate that this approach, while beneficial for generative models under direct inference scenario, is not imperative for discriminative models or the fine-tuning of generative PLMs. This discovery sheds new light on the design of manual templates in future studies. Building upon this insight, we propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings: Pretended Chain of Thought and Knowledge Enhancement. We confirm their effectiveness across various PLM types and provide a detailed exploration of the underlying factors contributing to their success.
翻译:句子嵌入是自然语言处理领域的一项基础任务,广泛应用于搜索引擎、专家系统和问答平台。随着LLaMA、Mistral等大型语言模型的持续演进,句子嵌入研究近期取得了显著突破。然而,这些进展主要集中于微调场景,针对计算高效的句子表示直接推理方法仍处于探索初期。本文致力于填补这一研究空白。通过系统性实验,我们挑战了“从预训练语言模型(PLMs)获取句子嵌入必须采用显式单词约束”这一普遍认知。研究表明,该方法虽有助于直接推理场景下的生成式模型,但对判别式模型或生成式PLMs的微调并非必需。这一发现为未来研究中人工模板的设计提供了新视角。基于此,我们提出两种创新提示工程方法——伪思维链与知识增强,可进一步提升PLMs原始嵌入的表达能力。我们验证了其在不同类型PLMs中的有效性,并深入剖析了促成其成功的潜在因素。