Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, particularly when augmented with search mechanisms that enable systematic exploration of external knowledge bases. The field has evolved from traditional retrieval-augmented generation (RAG) frameworks to more sophisticated search-based frameworks that orchestrate multi-step reasoning through explicit search strategies. However, existing search frameworks still rely heavily on implicit natural language reasoning to determine search strategies and how to leverage retrieved information across reasoning steps. This reliance on implicit reasoning creates fundamental challenges for managing dependencies between sub-questions, efficiently reusing previously retrieved knowledge, and learning optimal search strategies through reinforcement learning. To address these limitations, we propose Dep-Search, a dependency-aware search framework that advances beyond existing search frameworks by integrating structured reasoning, retrieval, and persistent memory through GRPO. Dep-Search introduces explicit control mechanisms that enable the model to decompose questions with dependency relationships, retrieve information when needed, access previously stored knowledge from memory, and summarize long reasoning contexts into reusable memory entries. Through extensive experiments on seven diverse question answering datasets, we demonstrate that Dep-Search significantly enhances LLMs' ability to tackle complex multi-hop reasoning tasks, achieving substantial improvements over strong baselines across different model scales.
翻译:大型语言模型(LLMs)在复杂推理任务中展现出卓越的能力,尤其是在结合了能够系统性探索外部知识库的搜索机制时。该领域已从传统的检索增强生成(RAG)框架,发展到通过显式搜索策略编排多步推理的更复杂的基于搜索的框架。然而,现有的搜索框架仍然严重依赖隐式的自然语言推理来确定搜索策略以及如何在推理步骤间利用检索到的信息。这种对隐式推理的依赖,为管理子问题间的依赖关系、高效复用先前检索到的知识以及通过强化学习学习最优搜索策略带来了根本性挑战。为解决这些局限性,我们提出了Dep-Search,这是一个依赖感知的搜索框架。它通过GRPO整合结构化推理、检索和持久性记忆,超越了现有的搜索框架。Dep-Search引入了显式的控制机制,使模型能够分解具有依赖关系的问题、在需要时检索信息、从记忆中访问先前存储的知识,并将长推理上下文总结为可复用的记忆条目。通过在七个不同的问答数据集上进行大量实验,我们证明Dep-Search显著增强了LLMs处理复杂多跳推理任务的能力,在不同模型规模上均实现了相对于强大基线的显著提升。