Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

A peculiarity of conversational search systems is that they involve mixed-initiatives such as system-generated query clarifying questions. Evaluating those systems at a large scale on the end task of IR is very challenging, requiring adequate datasets containing such interactions. However, current datasets only focus on either traditional ad-hoc IR tasks or query clarification tasks, the latter being usually seen as a reformulation task from the initial query. The only two datasets known to us that contain both document relevance judgments and the associated clarification interactions are Qulac and ClariQ. Both are based on the TREC Web Track 2009-12 collection, but cover a very limited number of topics (237 topics), far from being enough for training and testing conversational IR models. To fill the gap, we propose a methodology to automatically build large-scale conversational IR datasets from ad-hoc IR datasets in order to facilitate explorations on conversational IR. Our methodology is based on two processes: 1) generating query clarification interactions through query clarification and answer generators, and 2) augmenting ad-hoc IR datasets with simulated interactions. In this paper, we focus on MsMarco and augment it with query clarification and answer simulations. We perform a thorough evaluation showing the quality and the relevance of the generated interactions for each initial query. This paper shows the feasibility and utility of augmenting ad-hoc IR datasets for conversational IR.

翻译：对话搜索系统的一个特性是涉及混合主动机制，例如系统生成的查询澄清问题。在大规模信息检索端任务上评估这些系统极具挑战性，需要包含此类交互的充足数据集。然而，现有数据集仅聚焦于传统特定检索任务或查询澄清任务，后者通常被视为原始查询的重述任务。据我们所知，同时包含文档相关性判断和关联澄清交互的数据集仅有Qulac和ClariQ。两者均基于TREC Web Track 2009-12语料库，但仅覆盖有限主题（237个），远不足以训练和测试对话式信息检索模型。为填补这一空白，我们提出一种从特定检索数据集自动构建大规模对话式信息检索数据集的方法，以推动对话式信息检索领域的探索。该方法基于两个流程：1）通过查询澄清和答案生成器生成查询澄清交互；2）通过模拟交互增强特定检索数据集。本文聚焦于MsMarco数据集，为其补充查询澄清与答案模拟。通过全面评估，验证了为每个初始查询生成的交互质量与相关性。本研究表明增强特定检索数据集用于对话式信息检索的可行性与实用性。

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日