Shopping online is more and more frequent in our everyday life. For e-commerce search systems, understanding natural language coming through voice assistants, chatbots or from conversational search is an essential ability to understand what the user really wants. However, evaluation datasets with natural and detailed information needs of product-seekers which could be used for research do not exist. Due to privacy issues and competitive consequences, only few datasets with real user search queries from logs are openly available. In this paper, we present a dataset of 3,540 natural language queries in two domains that describe what users want when searching for a laptop or a jacket of their choice. The dataset contains annotations of vague terms and key facts of 1,754 laptop queries. This dataset opens up a range of research opportunities in the fields of natural language processing and (interactive) information retrieval for product search.
翻译:在线购物在日常生活中日益频繁。对于电子商务搜索系统而言,理解通过语音助手、聊天机器人或对话式搜索传递的自然语言,是理解用户真实需求的关键能力。然而,目前尚不存在包含产品或寻求者详细自然信息需求的评估数据集可供研究使用。由于隐私问题和商业竞争影响,只有极少数包含日志中真实用户搜索查询的数据集被公开提供。本文提出了一个包含两个领域共3,540条自然语言查询的数据集,这些查询描述了用户搜索笔记本电脑或心仪夹克时的具体需求。该数据集包含1,754条笔记本电脑查询的模糊术语和关键事实标注。该数据集为产品搜索领域的自然语言处理和(交互式)信息检索研究开辟了广阔的研究空间。