Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.
翻译:尽管大语言模型(LLMs)的最新进展显著提升了各类自然语言处理任务的生成能力,LLMs在处理检索任务时仍存在直接应用的局限性。然而,许多实际应用场景需要检索与生成能力的无缝集成。本文提出了一种新颖高效的单次生成与检索框架(OneGen),旨在提升LLMs在同时需要生成与检索的任务上的性能。该框架通过引入自回归生成的检索令牌,弥合了传统上生成与检索任务分离的训练方式,使得单一LLM能够在统一的单次前向传播中同时处理两类任务。我们在两种不同类型的复合任务(RAG与实体链接)上进行实验,验证了OneGen在训练与推理阶段的可插拔性、有效性和高效性。此外,实验结果表明,在同一上下文中集成生成与检索能够保持LLMs的生成能力,同时提升检索性能。据我们所知,OneGen首次实现了LLMs在生成过程中进行向量检索的能力。