Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset

Dany Haddad,Dan Bareket,Joseph Chee Chang,Jay DeYoung,Jena D. Hwang,Uri Katz,Mark Polak,Sangho Suh,Harshit Surana,Aryeh Tiktinsky,Shriya Atmakuri,Jonathan Bragg,Mike D'Arcy,Sergey Feldman,Amal Hassan-Ali,Rubén Lozano,Bodhisattwa Prasad Majumder,Charles McGrady,Amanpreet Singh,Brooke Vlahos,Yoav Goldberg,Doug Downey

AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.

翻译：AI驱动的科研工具正快速融入科研工作流程，然而该领域仍缺乏对研究者如何在真实场景中使用这些系统的清晰观察。我们提出并分析了Asta交互数据集——一个包含来自LLM驱动的检索增强生成平台内两个已部署工具（文献发现界面与科学问答界面）超过20万条用户查询及交互日志的大规模资源。利用该数据集，我们刻画了查询模式、参与行为以及使用方式如何随经验演变。研究发现，用户提交的查询比传统搜索更长且更复杂，并将系统视为协作研究伙伴，委托其执行起草内容、识别研究缺口等任务。用户将生成回复视为持久性成果，以非线性方式重新访问并在输出结果与引用证据间导航。随着经验积累，用户提出更具针对性的查询并更深入地利用支持性引用，但即使经验丰富的用户仍持续使用关键词式查询。我们发布了匿名化数据集及分析报告，并附有新的查询意图分类体系，旨在为现实世界AI科研助手的未来设计提供参考，并支持真实场景下的评估。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。