As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
翻译:随着信息检索系统的不断发展,对这些系统进行准确评估和基准测试变得至关重要。现有的网络搜索数据集(如MS MARCO)主要提供简短的关键词查询,而未附带查询意图或描述,这给理解用户潜在信息需求带来了挑战。本文提出了一种增强此类数据集的方法,旨在为查询标注信息性描述,重点关注两个重要的基准数据集:TREC-DL-21和TREC-DL-22。我们的方法利用最先进的大语言模型(LLM)来分析和理解基准数据集中单个查询的隐含意图。通过提取关键语义元素,我们为这些查询构建了详细且上下文丰富的描述。为了验证生成的查询描述,我们采用众包作为获取多样化人类视角的可靠手段,以评估描述的准确性和信息量。这些信息可用作排序、查询重写或其他相关任务的评估集。