Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.
翻译:基于大语言模型(LLM)的搜索代理通过整合信息检索能力,在解决知识密集型问题方面展现出巨大潜力。现有研究主要聚焦于优化搜索代理的推理范式,然而推理过程中中间搜索查询的质量问题仍未得到充分重视。这导致生成的查询往往不够准确,引发非预期的检索结果,最终制约了搜索代理的整体效能。为缓解这一问题,我们提出SmartSearch框架,其核心建立在两项关键机制之上:(1)过程奖励机制:通过双层级信用评估为每个中间搜索查询的质量提供细粒度监督;(2)查询优化机制:通过选择性优化低质量搜索查询,并基于优化结果重新生成后续搜索轮次,从而提升查询生成质量。为使搜索代理能够在过程奖励引导下逐步内化提升查询质量的能力,我们设计了三阶段课程学习框架,引导代理经历从模仿、对齐到泛化的渐进式学习过程。实验结果表明,SmartSearch在各项基准测试中均持续超越现有基线方法,定量分析进一步证实其在搜索效率与查询质量方面均取得显著提升。代码已开源:https://github.com/MYVAE/SmartSearch。