Deep research requires reasoning over web evidence to answer open-ended questions, and it is a core capability for AI agents. Yet many deep research agents still rely on implicit, unstructured search behavior that causes redundant exploration and brittle evidence aggregation. Motivated by Anthropic's "think" tool paradigm and insights from the information-retrieval literature, we introduce Q+, a set of query and evidence processing tools that make web search more deliberate by guiding query planning, monitoring search progress, and extracting evidence from long web snapshots. We integrate Q+ into the browser sub-agent of Eigent, an open-source, production-ready multi-agent workforce for computer use, yielding EigentSearch-Q+. Across four benchmarks (SimpleQA-Verified, FRAMES, WebWalkerQA, and X-Bench DeepSearch), Q+ improves Eigent's browser agent benchmark-size-weighted average accuracy by 3.0, 3.8, and 0.6 percentage points (pp) for GPT-4.1, GPT-5.1, and Minimax M2.5 model backends, respectively. Case studies further suggest that EigentSearch-Q+ produces more coherent tool-calling trajectories by making search progress and evidence handling explicit.
翻译:深度研究需要通过网络证据进行推理以回答开放性问题的能力,这是AI代理的核心能力。然而,许多深度研究代理仍依赖隐式、非结构化的搜索行为,导致冗余探索和脆弱的证据聚合。受Anthropic的"思考"工具范式及信息检索领域研究成果的启发,我们提出Q+——一组查询与证据处理工具,通过引导查询规划、监控搜索进程、从长网页快照中提取证据,使网络搜索更具策略性。我们将Q+集成至Eigent(一个开源、可投入生产的面向计算机使用的多代理协作系统)的浏览器子代理中,构建EigentSearch-Q+。在四个基准测试(SimpleQA-Verified、FRAMES、WebWalkerQA、X-Bench DeepSearch)上,Q+使Eigent的浏览器代理在GPT-4.1、GPT-5.1和Minimax M2.5模型后端上的基准规模加权平均准确率分别提升3.0、3.8和0.6个百分点。案例研究进一步表明,EigentSearch-Q+通过使搜索进度和证据处理显式化,生成了更连贯的工具调用轨迹。