SciDQA：面向科学论文的深度阅读理解数据集 (SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers)

Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors, ensuring a thorough examination of the literature. We enhance the dataset's quality through a process that carefully filters out lower quality questions, decontextualizes the content, tracks the source document across different versions, and incorporates a bibliography for multi-document question-answering. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials, and require multi-document reasoning. We evaluate several open-source and proprietary LLMs across various configurations to explore their capabilities in generating relevant and factual responses. Our comprehensive evaluation, based on metrics for surface-level similarity and LLM judgements, highlights notable performance discrepancies. SciDQA represents a rigorously curated, naturally derived scientific QA dataset, designed to facilitate research on complex scientific text understanding.

翻译：科学文献通常内容密集，需要深厚的背景知识和深度理解才能有效阅读。我们提出了SciDQA，这是一个用于阅读理解的新数据集，旨在挑战大型语言模型对科学文章的深度理解能力，该数据集包含2,937个问答对。与其他科学问答数据集不同，SciDQA的问题来源于领域专家的同行评审意见，答案则由论文作者提供，从而确保了对文献的全面考察。我们通过一系列流程提升了数据集的质量：仔细筛选低质量问题、对内容进行去语境化处理、追踪不同版本的源文档，并纳入参考文献以支持多文档问答。SciDQA中的问题需要跨图表、方程、附录和补充材料进行推理，并要求多文档推理能力。我们评估了多种开源和专有大型语言模型在不同配置下的表现，以探索其生成相关且事实性回答的能力。基于表层相似性度量和大型语言模型判断的综合评估揭示了显著的性能差异。SciDQA是一个经过严格筛选、自然衍生的科学问答数据集，旨在促进复杂科学文本理解的研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日