This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
翻译:本研究提出了RARE(检索增强推理增强框架),作为互推理框架(rStar)的通用扩展,旨在提升大语言模型(LLMs)在复杂知识密集型任务(如常识推理和医学推理)中的推理准确性与事实完整性。RARE在蒙特卡洛树搜索(MCTS)框架中引入了两种创新操作:A6,基于初始问题陈述生成检索查询,利用这些查询执行信息检索,并融合检索到的数据增强推理以生成最终答案;以及A7,专门针对生成的子问题进行信息检索,并利用相关上下文信息重新回答这些子问题。此外,本研究提出了一种检索增强事实性评分器,以替代原有的判别器,优先选择符合高事实性标准的推理路径。基于LLaMA 3.1的实验结果表明,RARE使开源LLMs能够达到与GPT-4、GPT-4o等顶尖开源模型相竞争的性能。本研究确立了RARE作为一种可扩展的解决方案,可在逻辑连贯性与事实完整性至关重要的领域有效改进大语言模型。