With automated systems increasingly issuing search queries alongside humans, Information Retrieval (IR) faces a major shift. Yet IR remains human-centred, with systems, evaluation metrics, user models, and datasets designed around human queries and behaviours. Consequently, IR operates under assumptions that no longer hold in practice, with changes to workload volumes, predictability, and querying behaviours. This misalignment affects system performance and optimisation: caching may lose effectiveness, query pre-processing may add overhead without improving results, and standard metrics may mismeasure satisfaction. Without adaptation, retrieval models risk satisfying neither humans, nor the emerging user segment of agents. However, datasets capturing agent search behaviour are lacking, which is a critical gap given IR's historical reliance on data-driven evaluation and optimisation. We develop a methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries, and we release the Agentic Search Queryset (ASQ) dataset. ASQ contains reasoning-induced queries, retrieved documents, and thoughts for queries in HotpotQA, Researchy Questions, and MS MARCO, for 3 diverse agents and 2 retrieval pipelines. The accompanying toolkit enables ASQ to be extended to new agents, retrievers, and datasets.
翻译:随着自动化系统日益与人类并行发出搜索查询,信息检索领域正面临重大变革。然而当前信息检索仍以人为中心,其系统设计、评估指标、用户模型和数据集均围绕人类查询行为构建。这导致信息检索所依赖的基本假设在实践中已不再成立,具体体现在工作负载规模、可预测性及查询行为模式的变化。此种错位将影响系统性能与优化:缓存机制可能失效,查询预处理可能徒增开销却无益于结果改进,标准评估指标可能无法准确衡量满意度。若不进行适应性调整,检索模型将面临既无法满足人类需求,也难以适应新兴智能体用户群体的困境。然而,当前缺乏能够捕捉智能体搜索行为的数据集,鉴于信息检索历来依赖数据驱动的评估与优化方法,这一缺失构成关键瓶颈。本研究开发了一套方法论,用于系统收集智能增强检索系统在应答查询过程中生成与消耗的全流程数据,并据此发布智能搜索查询集数据集。该数据集涵盖HotpotQA、Researchy Questions和MS MARCO三大基准中的推理驱动查询、检索文档及思维链数据,包含3种异构智能体与2种检索管道的实验记录。配套工具包支持将该数据集扩展至新型智能体、检索器及数据源。