BEIR-PL: Zero Shot Information Retrieval Benchmark for the Polish Language

The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish language, thereby advancing the research in this NLP area. In this work, inspired by mMARCO and Mr.~TyDi datasets, we translated all accessible open IR datasets into Polish, and we introduced the BEIR-PL benchmark -- a new benchmark which comprises 13 datasets, facilitating further development, training and evaluation of modern Polish language models for IR tasks. We executed an evaluation and comparison of numerous IR models on the newly introduced BEIR-PL benchmark. Furthermore, we publish pre-trained open IR models for Polish language,d marking a pioneering development in this field. Additionally, the evaluation revealed that BM25 achieved significantly lower scores for Polish than for English, which can be attributed to high inflection and intricate morphological structure of the Polish language. Finally, we trained various re-ranking models to enhance the BM25 retrieval, and we compared their performance to identify their unique characteristic features. To ensure accurate model comparisons, it is necessary to scrutinise individual results rather than to average across the entire benchmark. Thus, we thoroughly analysed the outcomes of IR models in relation to each individual data subset encompassed by the BEIR benchmark. The benchmark data is available at URL {\bf https://huggingface.co/clarin-knext}.

翻译：BEIR数据集是一个用于信息检索（IR）零样本设置的大规模异构基准，在研究界引起了广泛关注。然而，BEIR及类似数据集主要局限于英语。我们的目标是为波兰语的信息检索建立大规模资源，从而推动这一自然语言处理领域的研究。在这项工作中，受mMARCO和Mr.~TyDi数据集的启发，我们将所有可获取的开放IR数据集翻译成波兰语，并推出了BEIR-PL基准——一个包含13个数据集的新基准，旨在促进面向IR任务的现代波兰语语言模型的进一步开发、训练与评估。我们在新推出的BEIR-PL基准上对多种IR模型进行了评估与比较。此外，我们还发布了针对波兰语预训练的开放IR模型，这标志着该领域的开创性进展。评估结果还显示，BM25在波兰语上的得分显著低于英语，这归因于波兰语的高度屈折变化及其复杂的形态结构。最后，我们训练了多种重排序模型以增强BM25的检索效果，并比较了它们的性能以识别其独特特征。为确保模型比较的准确性，需逐一审视个体结果而非对整个基准进行平均。因此，我们针对BEIR基准所涵盖的每个单独数据子集，深入分析了IR模型的结果。基准数据可在URL {\bf https://huggingface.co/clarin-knext} 获取。

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日