Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.
翻译:现有的大多数问答数据集主要关注高资源语言中基于事实的短上下文问答。然而,这类数据集在低资源语言中的覆盖范围仍然有限,仅有少数工作集中于基于事实的问答数据集,而针对非事实性问答数据集的研究尚属空白。因此,本文提出了MuNfQuAD,一个包含非事实性问题的多语言问答数据集。该数据集利用BBC新闻文章中的疑问性子标题作为问题,并将相应段落作为银牌答案。该数据集包含超过37万个问答对,涵盖38种语言,其中包括多种低资源语言,是迄今为止规模最大的多语言问答数据集。基于对MuNfQuAD中790个问答对的人工标注,我们观察到98%的问题可以通过其对应的银牌答案得到解答。我们微调的答案段落选择模型在性能上超越了基线模型。该模型在MuNfQuAD测试集和黄金集上分别达到了80%和72%的准确率,以及72%和66%的宏平均F1分数。此外,即使在银牌标签上微调后,该模型仍能有效泛化到黄金集中的某些语言。