Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference methods for sentence representation in a nascent stage. This paper endeavors to bridge this research gap. Through comprehensive experimentation, we challenge the widely held belief in the necessity of an Explicit One-word Limitation for deriving sentence embeddings from Pre-trained Language Models (PLMs). We demonstrate that this approach, while beneficial for generative models under direct inference scenario, is not imperative for discriminative models or the fine-tuning of generative PLMs. This discovery sheds new light on the design of manual templates in future studies. Building upon this insight, we propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings: Pretended Chain of Thought and Knowledge Enhancement. We confirm their effectiveness across various PLM types and provide a detailed exploration of the underlying factors contributing to their success.
翻译:句子嵌入是自然语言处理领域的一项基础任务,广泛应用于搜索引擎、专家系统及问答平台。随着LLaMA、Mistral等大型语言模型的持续演进,句子嵌入研究近期取得了显著突破。然而,这些进展主要聚焦于微调场景,而关于句子表示的计算高效直接推理方法仍处于探索初期。本文旨在弥合这一研究空白。通过系统性实验,我们挑战了“从预训练语言模型(PLM)提取句子嵌入必须使用显式单词限制”这一主流认知。研究表明,该方法在直接推理场景下虽对生成式模型有效,但对判别式模型或生成式PLM的微调并非必要。这一发现为未来研究中人工模板的设计提供了全新视角。基于此洞察,我们提出了两种创新的提示工程技术——伪思维链与知识增强,可进一步提升PLM原始嵌入的表达能力。我们在多种PLM类型上验证了其有效性,并深入探究了促成成功的关键因素。