Search-augmented reasoning agents interleave multi-step reasoning with external information retrieval, but uncontrolled retrieval often leads to redundant evidence, context saturation, and unstable learning. Existing approaches rely on outcome-based reinforcement learning (RL), which provides limited guidance for regulating information acquisition. We propose DeepControl, a framework for adaptive information control based on a formal notion of information utility, which measures the marginal value of retrieved evidence under a given reasoning state. Building on this utility, we introduce retrieval continuation and granularity control mechanisms that selectively regulate when to continue and stop retrieval, and how much information to expand. An annealed control strategy enables the agent to internalize effective information acquisition behaviors during training. Extensive experiments across seven benchmarks demonstrate that our method consistently outperforms strong baselines. In particular, our approach achieves average performance improvements of 9.4% and 8.6% on Qwen2.5-7B and Qwen2.5-3B, respectively, over strong outcome-based RL baselines, and consistently outperforms both retrieval-free and retrieval-based reasoning methods without explicit information control. These results highlight the importance of adaptive information control for scaling search-augmented reasoning agents to complex, real-world information environments.
翻译:搜索增强推理智能体将多步推理与外部信息检索交织进行,但不受控制的检索常导致冗余证据、上下文饱和及不稳定学习。现有方法依赖于基于结果的强化学习,其对调节信息获取的指导有限。我们提出DeepControl框架,该框架基于信息效用的形式化概念实现自适应信息控制,该效用度量给定推理状态下检索证据的边际价值。基于此效用,我们引入检索延续与粒度控制机制,选择性地调控何时继续与停止检索,以及扩展多少信息。退火控制策略使智能体在训练过程中内化有效的信息获取行为。在七个基准测试上的大量实验表明,我们的方法始终优于强基线。具体而言,在Qwen2.5-7B和Qwen2.5-3B模型上,我们的方法相较于基于结果的强化学习基线分别实现了平均9.4%和8.6%的性能提升,并且始终优于无显式信息控制的纯检索方法与检索增强推理方法。这些结果凸显了自适应信息控制对于将搜索增强推理智能体扩展至复杂现实信息环境的重要性。