LLM-powered search agents are increasingly being used for multi-step information seeking tasks, yet the IR community lacks empirical understanding of how agentic search sessions unfold and how retrieved evidence is used. This paper presents a large-scale log analysis of agentic search based on 14.44M search requests (3.97M sessions) collected from DeepResearchGym, i.e. an open-source search API accessed by external agentic clients. We sessionize the logs, assign session-level intents and step-wise query-reformulation labels using LLM-based annotation, and propose Context-driven Term Adoption Rate (CTAR) to quantify whether newly introduced query terms are traceable to previously retrieved evidence. Our analyses reveal distinctive behavioral patterns. First, over 90% of multi-turn sessions contain at most ten steps, and 89% of inter-step intervals fall under one minute. Second, behavior varies by intent. Fact-seeking sessions exhibit high repetition that increases over time, while sessions requiring reasoning sustain broader exploration. Third, agents reuse evidence across steps. On average, 54% of newly introduced query terms appear in the accumulated evidence context, with contributions from earlier steps beyond the most recent retrieval. The findings suggest that agentic search may benefit from repetition-aware early stopping, intent-adaptive retrieval budgets, and explicit cross-step context tracking. We plan to release the anonymized logs to support future research.
翻译:基于大语言模型(LLM)的搜索智能体正日益被用于多步骤信息寻求任务,然而信息检索(IR)领域对于智能体搜索会话如何展开以及检索到的证据如何被使用,仍缺乏实证理解。本文基于从DeepResearchGym(一个供外部智能体客户端访问的开源搜索API)收集的1444万次搜索请求(397万个会话),对智能体搜索进行了大规模日志分析。我们对日志进行会话划分,使用基于LLM的标注方法分配会话级意图和逐步骤的查询重构标签,并提出上下文驱动的术语采纳率(Context-driven Term Adoption Rate, CTAR)来量化新引入的查询术语是否可追溯到先前检索到的证据。我们的分析揭示了独特的行为模式。首先,超过90%的多轮会话最多包含十个步骤,89%的步骤间间隔在一分钟以内。其次,行为因意图而异。事实寻求型会话表现出较高的重复性,且随时间推移而增加;而需要推理的会话则维持更广泛的探索。第三,智能体在步骤间复用证据。平均而言,54%的新引入查询术语出现在累积的证据上下文中,其中贡献不仅来自最近一次检索,也来自更早的步骤。这些发现表明,智能体搜索可能受益于感知重复性的早停策略、意图自适应的检索预算分配以及显式的跨步骤上下文追踪。我们计划发布匿名化日志以支持未来研究。