Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/tetsu9923/SciReviewGen.
翻译:自动文献综述生成是自然语言处理中最具挑战性的任务之一。尽管大型语言模型已初步解决文献综述生成问题,但缺乏大规模数据集始终是阻碍该领域进展的关键障碍。我们发布了SciReviewGen数据集,包含超过10,000篇文献综述以及这些综述中引用的690,000篇论文。基于该数据集,我们评估了近期基于Transformer的摘要模型(包括为文献综述生成扩展的Fusion-in-Decoder模型)在文献综述生成任务上的表现。人工评估结果表明,部分机器生成的摘要已达到与人工撰写综述相当的水平,同时也揭示了自动文献综述生成面临的挑战,例如内容幻觉和细节信息缺失。我们的数据集与代码已开源至 https://github.com/tetsu9923/SciReviewGen。