LLatrieval: LLM-Verified Retrieval for Verifiable Generation

Verifiable generation aims to let the large language model (LLM) generate text with corresponding supporting documents, which enables the user to flexibly verify the answer and makes it more trustworthy. Its evaluation not only measures the correctness of the answer, but also the answer's verifiability, i.e., how well the answer is supported by the corresponding documents. In typical, verifiable generation adopts the retrieval-read pipeline, which is divided into two stages: 1) retrieve relevant documents of the question. 2) according to the documents, generate the corresponding answer. Since the retrieved documents can supplement knowledge for the LLM to generate the answer and serve as evidence, the retrieval stage is essential for the correctness and verifiability of the answer. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. They often have fewer parameters than the large language model and have not been proven to scale well to the size of LLMs. Since the LLM passively receives the retrieval result, if the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. In this paper, we propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to sufficiently support verifiable generation. Experimental results show that our method significantly outperforms extensive baselines and achieves new state-of-the-art results.

翻译：可验证生成旨在让大语言模型（LLM）生成文本时附带相应的支撑文档，使用户能够灵活核验答案并提升其可信度。其评估不仅衡量答案的正确性，还衡量答案的可验证性，即答案被对应文档支撑的程度。典型情况下，可验证生成采用检索-阅读流水线，分为两个阶段：1）检索与问题相关的文档；2）根据文档生成对应答案。由于检索到的文档既能补充LLM生成答案所需的知识，又可用作证据，因此检索阶段对答案的正确性与可验证性至关重要。然而，广泛使用的检索器成为整个流水线的瓶颈并限制了整体性能。它们通常参数量远少于大语言模型，且尚未被证明能随LLM规模有效扩展。由于LLM被动接收检索结果，若检索器未能正确找到支撑文档，LLM便无法生成正确且可验证的答案，这掩盖了LLM的卓越能力。本文提出LLatrieval（大语言模型验证式检索），该方法使LLM更新检索结果，直至其验证检索到的文档能够支撑问题回答。由此，LLM可迭代地向检索提供反馈，促使检索结果充分支撑可验证生成。实验结果表明，我们的方法显著优于多种基线方法，并取得了新的最佳结果。