Deep search agents powered by large language models have demonstrated strong capabilities in multi-step retrieval, reasoning, and long-horizon task execution. However, their practical failures often stem from the lack of mechanisms to monitor and regulate reasoning and retrieval states as tasks evolve under uncertainty. Insights from cognitive neuroscience suggest that human metacognition is hierarchically organized, integrating fast anomaly detection with selectively triggered, experience-driven reflection. In this work, we propose Deep Search with Meta-Cognitive Monitoring (DS-MCM), a deep search framework augmented with an explicit hierarchical metacognitive monitoring mechanism. DS-MCM integrates a Fast Consistency Monitor, which performs lightweight checks on the alignment between external evidence and internal reasoning confidence, and a Slow Experience-Driven Monitor, which is selectively activated to guide corrective intervention based on experience memory from historical agent trajectories. By embedding monitoring directly into the reasoning-retrieval loop, DS-MCM determines both when intervention is warranted and how corrective actions should be informed by prior experience. Experiments across multiple deep search benchmarks and backbone models demonstrate that DS-MCM consistently improves performance and robustness.
翻译:基于大语言模型的深度搜索智能体在多步检索、推理和长程任务执行方面展现出强大能力。然而,其实践中的失败往往源于缺乏在不确定性下任务演进过程中监控和调节推理与检索状态的机制。认知神经科学的启示表明,人类元认知具有分层组织结构,将快速异常检测与选择性触发的、经验驱动的反思相结合。在本工作中,我们提出了带有元认知监控的深度搜索(DS-MCM),这是一个增强了显式分层元认知监控机制的深度搜索框架。DS-MCM集成了一个快速一致性监控器,用于对外部证据与内部推理置信度之间的一致性进行轻量级检查;以及一个慢速经验驱动监控器,该监控器被选择性激活,以基于历史智能体轨迹的经验记忆来指导纠正性干预。通过将监控直接嵌入推理-检索循环,DS-MCM能够确定何时需要干预以及纠正性行动应如何依据先前经验来制定。在多个深度搜索基准测试和骨干模型上的实验表明,DS-MCM能持续提升性能和鲁棒性。