Question Answering (QA) datasets are crucial in assessing reading comprehension skills for both machines and humans. While numerous datasets have been developed in English for this purpose, a noticeable void exists in less-resourced languages. To alleviate this gap, our paper introduces machine-translated versions of FairytaleQA, a renowned QA dataset designed to assess and enhance narrative comprehension skills in young children. By employing fine-tuned, modest-scale models, we establish benchmarks for both Question Generation (QG) and QA tasks within the translated datasets. In addition, we present a case study proposing a model for generating question-answer pairs, with an evaluation incorporating quality metrics such as question well-formedness, answerability, relevance, and children suitability. Our evaluation prioritizes quantifying and describing error cases, along with providing directions for future work. This paper contributes to the advancement of QA and QG research in less-resourced languages, promoting accessibility and inclusivity in the development of these models for reading comprehension. The code and data is publicly available at github.com/bernardoleite/fairytaleqa-translated.
翻译:问答(QA)数据集对于评估机器和人类的阅读理解能力至关重要。尽管为此目的已开发了大量英文数据集,但在资源匮乏语言中仍存在明显空白。为缓解这一差距,本文介绍了 FairytaleQA 的机器翻译版本,该著名 QA 数据集旨在评估和提升低龄儿童的叙事理解能力。通过采用微调的中等规模模型,我们在翻译数据集中为问题生成(QG)和 QA 任务建立了基准。此外,我们提出了一个案例研究,提出了一种生成问答对的模型,其评估纳入了问题规范性、可答性、相关性和儿童适宜性等质量指标。我们的评估重点在于量化和描述错误案例,并提供未来工作方向。本文有助于推动资源匮乏语言中的 QA 和 QG 研究,促进这些阅读理解模型开发的可及性与包容性。代码和数据公开于 github.com/bernardoleite/fairytaleqa-translated。