Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. We propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. We train ReSearch on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct) models and conduct extensive experiments. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks. Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction during the reinforcement learning process.
翻译:大语言模型(LLMs)在推理方面已展现出卓越的能力,OpenAI-o1 和 DeepSeek-R1 的成功便是例证。然而,将推理与外部搜索过程相结合仍然具有挑战性,特别是对于需要多次检索步骤的复杂多跳问题。我们提出了 ReSearch,一个新颖的框架,它通过强化学习训练 LLMs 学会结合搜索进行推理,而无需使用任何关于推理步骤的监督数据。我们的方法将搜索操作视为推理链的组成部分,其中何时以及如何执行搜索由基于文本的思考引导,而搜索结果随后会影响进一步的推理。我们在 Qwen2.5-7B(-Instruct) 和 Qwen2.5-32B(-Instruct) 模型上训练 ReSearch,并进行了广泛的实验。尽管仅在一个数据集上进行训练,我们的模型在各种基准测试中表现出强大的泛化能力。分析表明,ReSearch 在强化学习过程中自然地激发了诸如反思和自我修正等高级推理能力。