Retrieval-augmented generation (RAG) encounters challenges when addressing complex queries, particularly multi-hop questions. While several methods tackle multi-hop queries by iteratively generating internal queries and retrieving external documents, these approaches are computationally expensive. In this paper, we identify a three-stage information processing pattern in LLMs during layer-by-layer reasoning, consisting of extraction, processing, and subsequent extraction steps. This observation suggests that the representations in intermediate layers contain richer information compared to those in other layers. Building on this insight, we propose Layer-wise RAG (L-RAG). Unlike prior methods that focus on generating new internal queries, L-RAG leverages intermediate representations from the middle layers, which capture next-hop information, to retrieve external knowledge. L-RAG achieves performance comparable to multi-step approaches while maintaining inference overhead similar to that of standard RAG. Experimental results show that L-RAG outperforms existing RAG methods on open-domain multi-hop question-answering datasets, including MuSiQue, HotpotQA, and 2WikiMultiHopQA. The code is available in https://github.com/Olive-2019/L-RAG
翻译:检索增强生成(RAG)在处理复杂查询,特别是多跳问题时面临挑战。现有多种方法通过迭代生成内部查询并检索外部文档来处理多跳查询,但这些方法计算成本高昂。本文发现,大语言模型在逐层推理过程中存在一个三阶段信息处理模式,包括提取、处理和再次提取步骤。这一观察表明,中间层的表示相比其他层包含更丰富的信息。基于此洞见,我们提出了层间检索增强生成(L-RAG)。与先前专注于生成新内部查询的方法不同,L-RAG利用中间层的表示(这些表示捕获了下一跳信息)来检索外部知识。L-RAG在保持与标准RAG相近的推理开销的同时,实现了与多步方法相当的性能。实验结果表明,在开放域多跳问答数据集(包括MuSiQue、HotpotQA和2WikiMultiHopQA)上,L-RAG优于现有的RAG方法。代码可在 https://github.com/Olive-2019/L-RAG 获取。