Recent progress in Large Language Model (LLM) technology has changed our role in interacting with these models. Instead of primarily testing these models with questions we already know answers to, we are now using them for queries where the answers are unknown to us, driven by human curiosity. This shift highlights the growing need to understand curiosity-driven human questions - those that are more complex, open-ended, and reflective of real-world needs. To this end, we present Quriosity, a collection of 13.5K naturally occurring questions from three diverse sources: human-to-search-engine queries, human-to-human interactions, and human-to-LLM conversations. Our comprehensive collection enables a rich understanding of human curiosity across various domains and contexts. Our analysis reveals a significant presence of causal questions (up to 42%) in the dataset, for which we develop an iterative prompt improvement framework to identify all causal queries and examine their unique linguistic properties, cognitive complexity and source distribution. Our paper paves the way for future work on causal question identification and open-ended chatbot interactions. Our code and data are at https://github.com/roberto-ceraolo/quriosity.
翻译:大型语言模型(LLM)技术的近期进展改变了我们与这些模型交互的角色定位。我们不再主要用已知答案的问题来测试模型,而是越来越多地出于人类好奇心驱动,向模型提出我们自身未知答案的查询。这一转变凸显了理解好奇心驱动的人类问题——那些更复杂、开放且反映真实世界需求的问题——日益增长的重要性。为此,我们提出了Quriosity,一个包含13.5K个自然产生问题的数据集,来源于三个多样化渠道:人类对搜索引擎的查询、人际交互对话以及人类与LLM的对话。我们的综合性数据集支持对不同领域和情境下人类好奇心的深入理解。分析表明,数据集中存在显著比例的因果性问题(高达42%)。为此,我们开发了一个迭代式提示改进框架,以识别所有因果查询,并考察其独特的语言特性、认知复杂度及来源分布。本研究为未来因果问题识别和开放域聊天机器人交互的相关工作奠定了基础。代码与数据公开于 https://github.com/roberto-ceraolo/quriosity。