This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
翻译:本研究提出RARE(检索增强推理优化),作为互推理框架(rStar)的通用扩展,旨在提升大语言模型(LLMs)在常识推理与医疗推理等复杂知识密集型任务中的推理准确性与事实完整性。RARE在蒙特卡洛树搜索(MCTS)框架中引入了两项创新操作:A6根据初始问题生成检索查询,利用查询执行信息检索,并融合检索结果增强推理以生成最终答案;A7则专门针对生成的子问题进行信息检索,并基于相关上下文信息重新回答这些子问题。此外,本研究提出检索增强事实性评分器以替代原始判别器,优先选择符合高标准事实性要求的推理路径。基于LLaMA 3.1的实验结果表明,RARE能使开源LLMs达到与GPT-4、GPT-4o等顶尖开源模型相竞争的性能。本研究表明RARE是一种可扩展的解决方案,可在逻辑连贯性与事实完整性至关重要的领域有效改进LLMs。