AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.
翻译:AI驱动的科研工具正快速融入科研工作流程,然而该领域仍缺乏对研究者如何在真实场景中使用这些系统的清晰观察。我们提出并分析了Asta交互数据集——一个包含来自LLM驱动的检索增强生成平台内两个已部署工具(文献发现界面与科学问答界面)超过20万条用户查询及交互日志的大规模资源。利用该数据集,我们刻画了查询模式、参与行为以及使用方式如何随经验演变。研究发现,用户提交的查询比传统搜索更长且更复杂,并将系统视为协作研究伙伴,委托其执行起草内容、识别研究缺口等任务。用户将生成回复视为持久性成果,以非线性方式重新访问并在输出结果与引用证据间导航。随着经验积累,用户提出更具针对性的查询并更深入地利用支持性引用,但即使经验丰富的用户仍持续使用关键词式查询。我们发布了匿名化数据集及分析报告,并附有新的查询意图分类体系,旨在为现实世界AI科研助手的未来设计提供参考,并支持真实场景下的评估。