Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks. Despite the successes of chain-of-thought and tree-based search methods, they mainly depend on the internal knowledge of LLMs to search over intermediate reasoning steps, limited to dealing with simple tasks involving fewer reasoning steps. In this paper, we propose \textbf{RAG-Star}, a novel RAG approach that integrates the retrieved information to guide the tree-based deliberative reasoning process that relies on the inherent knowledge of LLMs. By leveraging Monte Carlo Tree Search, RAG-Star iteratively plans intermediate sub-queries and answers for reasoning based on the LLM itself. To consolidate internal and external knowledge, we propose an retrieval-augmented verification that utilizes query- and answer-aware reward modeling to provide feedback for the inherent reasoning of LLMs. Our experiments involving Llama-3.1-8B-Instruct and GPT-4o demonstrate that RAG-Star significantly outperforms previous RAG and reasoning methods.
翻译:现有的大型语言模型展现出卓越的问题解决能力,但在复杂推理任务上仍可能面临挑战。尽管思维链和基于树的搜索方法取得了成功,但它们主要依赖LLM的内部知识来搜索中间推理步骤,局限于处理涉及较少推理步骤的简单任务。本文提出\textbf{RAG-Star},一种新颖的检索增强生成方法,该方法整合检索到的信息来引导基于树的审慎推理过程,该过程本身依赖于LLM的固有知识。通过利用蒙特卡洛树搜索,RAG-Star基于LLM自身迭代地规划用于推理的中间子查询及其答案。为了整合内部与外部知识,我们提出一种检索增强验证机制,该机制利用查询与答案感知的奖励建模为LLM的固有推理提供反馈。我们在Llama-3.1-8B-Instruct和GPT-4o上进行的实验表明,RAG-Star显著优于以往的RAG与推理方法。