Text embedding has become a foundational technology in natural language processing (NLP) during the deep learning era, driving advancements across a wide array of downstream tasks. While many natural language understanding challenges can now be modeled using generative paradigms and leverage the robust generative and comprehension capabilities of large language models (LLMs), numerous practical applications - such as semantic matching, clustering, and information retrieval - continue to rely on text embeddings for their efficiency and effectiveness. Therefore, integrating LLMs with text embeddings has become a major research focus in recent years. In this survey, we categorize the interplay between LLMs and text embeddings into three overarching themes: (1) LLM-augmented text embedding, enhancing traditional embedding methods with LLMs; (2) LLMs as text embedders, adapting their innate capabilities for high-quality embedding; and (3) Text embedding understanding with LLMs, leveraging LLMs to analyze and interpret embeddings. By organizing recent works based on interaction patterns rather than specific downstream applications, we offer a novel and systematic overview of contributions from various research and application domains in the era of LLMs. Furthermore, we highlight the unresolved challenges that persisted in the pre-LLM era with pre-trained language models (PLMs) and explore the emerging obstacles brought forth by LLMs. Building on this analysis, we outline prospective directions for the evolution of text embedding, addressing both theoretical and practical opportunities in the rapidly advancing landscape of NLP.
翻译:在深度学习时代,文本嵌入已成为自然语言处理(NLP)领域的一项基础技术,推动了众多下游任务的进步。尽管许多自然语言理解挑战如今可通过生成范式建模,并利用大语言模型(LLMs)强大的生成与理解能力,但大量实际应用——如语义匹配、聚类和信息检索——因其效率与效能,仍依赖于文本嵌入。因此,将LLMs与文本嵌入相结合已成为近年来的重要研究方向。本综述将LLMs与文本嵌入的交互关系归纳为三大主题:(1)LLM增强的文本嵌入,利用LLMs提升传统嵌入方法;(2)LLMs作为文本嵌入器,适配其内在能力以生成高质量嵌入;(3)基于LLMs的文本嵌入理解,利用LLMs分析与解释嵌入。通过依据交互模式而非特定下游应用来组织近期工作,我们为LLM时代下各研究与应用领域的贡献提供了一个新颖且系统的概览。此外,我们强调了在LLM前时代中预训练语言模型(PLMs)尚未解决的挑战,并探讨了LLMs带来的新兴难题。基于此分析,我们展望了文本嵌入的未来发展方向,以应对NLP快速演进背景下理论与实践中涌现的机遇。