Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose \textbf{RetroLLM}, a unified framework that integrates retrieval and generation into a single, cohesive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy. Extensive experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks. The code is available at \url{https://github.com/sunnynexus/RetroLLM}.
翻译:大语言模型展现出卓越的生成能力,但常受幻觉问题困扰。检索增强生成通过引入外部知识提供了有效的解决方案,但现有方法仍面临若干局限:独立检索器带来的额外部署成本、检索文本块导致的冗余输入标记,以及检索与生成缺乏联合优化。为解决这些问题,我们提出 \textbf{RetroLLM},一个将检索与生成统一整合为单一连贯流程的框架,使大语言模型能够通过约束解码直接从语料库中生成细粒度证据。此外,为缓解约束证据生成过程中的错误剪枝,我们引入(1)分层FM-Index约束,该机制生成语料库约束的线索以在证据生成前识别相关文档子集,从而减少无关解码空间;以及(2)一种前瞻性约束解码策略,该策略考虑未来序列的相关性以提高证据准确性。在五个开放域问答数据集上的大量实验表明,RetroLLM在领域内和领域外任务上均表现出优越性能。代码发布于 \url{https://github.com/sunnynexus/RetroLLM}。