We introduce SciQAG, a novel framework for automatically generating high-quality science question-answer pairs from a large corpus of scientific literature based on large language models (LLMs). SciQAG consists of a QA generator and a QA evaluator, which work together to extract diverse and research-level questions and answers from scientific papers. Utilizing this framework, we construct a large-scale, high-quality, open-ended science QA dataset containing 188,042 QA pairs extracted from 22,743 scientific papers across 24 scientific domains. We also introduce SciQAG-24D, a new benchmark task designed to evaluate the science question-answering ability of LLMs. Extensive experiments demonstrate that fine-tuning LLMs on the SciQAG dataset significantly improves their performance on both open-ended question answering and scientific tasks. To foster research and collaboration, we make the datasets, models, and evaluation codes publicly available, contributing to the advancement of science question answering and developing more interpretable and reasoning-capable AI systems.
翻译:本文介绍SciQAG,一种基于大语言模型(LLM)从大规模科学文献语料库中自动生成高质量科学问答对的新型框架。SciQAG由问答生成器和问答评估器组成,两者协同工作,从科学论文中提取多样化且具有研究深度的问题与答案。利用该框架,我们构建了一个大规模、高质量、开放式的科学问答数据集,其中包含从24个科学领域的22,743篇科学论文中提取的188,042个问答对。我们还提出了SciQAG-24D,这是一个旨在评估LLM科学问答能力的新基准任务。大量实验表明,在SciQAG数据集上对LLM进行微调,能显著提升其在开放式问答和科学任务上的性能。为促进研究和协作,我们公开了数据集、模型和评估代码,以推动科学问答研究的进步,并助力开发更具可解释性和推理能力的人工智能系统。