Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pir\'a. By creating these baselines, researchers can more easily utilize Pir\'a as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pir\'a dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have also produced a curated version of the original dataset, where we fixed a number of grammar issues, repetitions, and other shortcomings. Furthermore, the dataset has been extended in several new directions, so as to face the aforementioned benchmarks: translation of supporting texts from English into Portuguese, classification labels for answerability, automatic paraphrases of questions and answers, and multiple choice candidates. The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pir\'a dataset.
翻译:Pirá是一个聚焦于海洋、巴西海岸及气候变化的阅读理解数据集,其构建基于这些主题的科学摘要与报告集合。该数据集作为一项通用的语言资源,特别适用于检验当前机器学习模型获取专家科学知识的能力。尽管潜力显著,但针对Pirá尚未开发出详尽的基准测试集。通过建立这些基准测试,研究人员能够更便捷地将Pirá作为测试机器学习模型在多种问答任务中表现的资源。本文在Pirá数据集上定义了六项基准测试,涵盖:封闭式生成式问答、机器阅读理解、信息检索、开放式问答、答案触发及多项选择问答。作为此项工作的一部分,我们还对原始数据集进行了精炼版本的处理,修正了多项语法错误、重复内容及其他缺陷。此外,数据集在多个新方向上进行了扩展,以应对上述基准测试:将支撑文本从英语翻译为葡萄牙语、添加可回答性分类标签、自动生成问答对释义、以及提供多项选择候选答案。本文所述结果为有意探索Pirá数据集所挑战的研究人员提供了多个参照基准点。