In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.
翻译:本文介绍了NorQuAD:首个用于机器阅读理解任务的挪威语问答数据集。该数据集包含4,752个人工构建的问答对。我们在此详述了数据收集流程并展示了数据集统计信息。我们还对该数据集上的多个多语言及挪威语单语言模型进行了基准测试,并将其与人类表现进行了对比。该数据集将免费开放使用。