T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

from arxiv, This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/

翻译：段落排序包含段落检索与段落重排序两个阶段，这是信息检索领域学术界与工业界共同关注的重要且具有挑战性的课题。然而，当前常用的段落排序数据集多聚焦于英文场景。针对中文等非英语场景，现有数据集在数据规模、细粒度相关性标注及假负例问题方面存在局限。为解决这一问题，我们提出T2Ranking——一个面向段落排序的大规模中文基准数据集。T2Ranking包含来自真实搜索引擎的30余万条查询及超过200万条独立段落。我们招募领域专家对查询-段落对进行四级分级相关性评分（细粒度标注），而非二元相关性判断（粗粒度标注）。为缓解假负例问题，在执行相关性标注时（尤其在测试集中）纳入更多高多样性段落，从而确保评估结果的准确性。除文本型查询与段落数据外，我们还提供查询类型、段落来源文档的XML文件等辅助资源，以促进深入研究。为评估该数据集，我们在T2Ranking上实现了多种常用排序模型作为基准。实验结果表明，T2Ranking具有挑战性且仍有提升空间。完整数据及所有代码均可在https://github.com/THUIR/T2Ranking/获取。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【CIKM2021】用户行为序列对比学习的上下文感知文档排序

专知会员服务

21+阅读 · 2021年8月30日

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日