The use of question-answer (QA) pairs for training and evaluating large language models (LLMs) has attracted considerable attention. Yet few available QA datasets are based on knowledge from the scientific literature. Here we bridge this gap by presenting Automatic Generation of Scientific Question Answers (SciQAG), a framework for automatic generation and evaluation of scientific QA pairs sourced from published scientific literature. We fine-tune an open-source LLM to generate \num{960000} scientific QA pairs from full-text scientific papers and propose a five-dimensional metric to evaluate the quality of the generated QA pairs. We show via LLM-based evaluation that the generated QA pairs consistently achieve an average score of 2.5 out of 3 across five dimensions, indicating that our framework can distill key knowledge from papers into high-quality QA pairs at scale. We make the dataset, models, and evaluation codes publicly available.
翻译:使用问答对训练和评估大型语言模型已引起广泛关注,但目前基于科学文献知识的问答数据集仍然稀缺。为此,我们提出科学问答自动生成框架SciQAG,该框架能够从已发表的科学文献中自动生成并评估科学问答对。我们对开源大语言模型进行微调,从全文科学论文中生成了96万个科学问答对,并提出五维评估指标以衡量生成问答对的质量。基于大语言模型的评估表明,生成的问答对在五个维度上平均得分达到2.5分(满分3分),证明该框架能够从论文中高效提炼关键知识并生成高质量问答对。我们已公开提供数据集、模型及评估代码。