In this paper, we create NaijaRC: a new multi-choice Reading Comprehension dataset for three native Nigeria languages that is based on high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE and Belebele training dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.
翻译:本文构建了NaijaRC:一个基于高中阅读理解考试、针对三种尼日利亚本地语言的新型多项选择阅读理解数据集。我们基于预训练的仅编码器模型,利用已有的英语RACE和Belebele训练数据集进行跨语言迁移,提供了基线结果。此外,我们还通过提示大型语言模型(如GPT-4)给出了实验结果。