LongEval-Retrieval is a Web document retrieval benchmark that focuses on continuous retrieval evaluation. This test collection is intended to be used to study the temporal persistence of Information Retrieval systems and will be used as the test collection in the Longitudinal Evaluation of Model Performance Track (LongEval) at CLEF 2023. This benchmark simulates an evolving information system environment - such as the one a Web search engine operates in - where the document collection, the query distribution, and relevance all move continuously, while following the Cranfield paradigm for offline evaluation. To do that, we introduce the concept of a dynamic test collection that is composed of successive sub-collections each representing the state of an information system at a given time step. In LongEval-Retrieval, each sub-collection contains a set of queries, documents, and soft relevance assessments built from click models. The data comes from Qwant, a privacy-preserving Web search engine that primarily focuses on the French market. LongEval-Retrieval also provides a 'mirror' collection: it is initially constructed in the French language to benefit from the majority of Qwant's traffic, before being translated to English. This paper presents the creation process of LongEval-Retrieval and provides baseline runs and analysis.
翻译:LongEval-Retrieval是一个聚焦于持续检索评估的网页文档基准测试集。该测试集旨在用于研究信息检索系统的时间持久性,并将作为CLEF 2023纵向模型性能评估赛道(LongEval)的测试集。该基准模拟了一个不断演进的信息系统环境——例如网页搜索引擎运行的环境——其中文档集合、查询分布及相关性均持续变化,同时遵循克兰菲尔德范式进行离线评估。为此,我们引入了动态测试集的概念,该测试集由连续的子集合构成,每个子集合代表信息系统在特定时间步的状态。在LongEval-Retrieval中,每个子集合包含一组基于点击模型构建的查询、文档及软相关性评估。数据来源于Qwant——一家主要面向法国市场的隐私保护型网页搜索引擎。LongEval-Retrieval还提供了一份"镜像"集合:初始以法语构建以充分利用Qwant的大部分流量,随后翻译为英语。本文介绍了LongEval-Retrieval的创建过程,并提供了基线运行结果与分析。