Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (1) Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries; (2) Although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; (3) Generative retrieval models have a natural advantage in handling exclusionary queries. To facilitate future research on exclusionary retrieval, we share the benchmark and evaluation scripts on \url{https://github.com/zwh-sdu/ExcluIR}.
翻译:排除是人类表达不想要事物的一种重要且普遍的言语技能。然而,在信息检索领域,关于用户通过查询表达不想要内容的排除性检索研究甚少。本研究首次探讨文档检索中的排除性检索场景。我们提出ExcluIR,一套面向排除性检索的资源,包含评估基准和训练集,旨在帮助检索模型理解排除性查询。评估基准包含3,452条高质量排除性查询,每条均经人工标注。训练集包含70,293条排除性查询,每条与正例文档和负例文档配对。通过详细实验与分析,我们获得三项主要发现:(1)现有不同架构的检索模型难以有效理解排除性查询;(2)虽然整合训练数据可提升检索模型在排除性检索上的性能,但仍与人类表现存在差距;(3)生成式检索模型在处理排除性查询时具有天然优势。为促进未来排除性检索研究,我们在\url{https://github.com/zwh-sdu/ExcluIR} 分享了基准与评估脚本。