Science progresses by incrementally building upon the prior body of knowledge documented in scientific publications. The acceleration of research across many fields makes it hard to stay up-to-date with the recent developments and to summarize the ever-growing body of prior work. To target this issue, the task of citation text generation aims to produce accurate textual summaries given a set of papers-to-cite and the citing paper context. Existing studies in citation text generation are based upon widely diverging task definitions, which makes it hard to study this task systematically. To address this challenge, we propose CiteBench: a benchmark for citation text generation that unifies multiple diverse datasets and enables standardized evaluation of citation text generation models across task designs and domains. Using the new benchmark, we investigate the performance of multiple strong baselines, test their transferability between the datasets, and deliver new insights into the task definition and evaluation to guide future research in citation text generation. We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
翻译:科学进步依赖于循序渐进地建立在已有学术文献知识体系之上。随着众多领域研究进程的加速,学者难以同步跟进最新发展动态,亦难以对不断膨胀的既有工作体系进行有效总结。针对这一问题,引文文本生成任务旨在根据待引文献集和施引论文语境生成精准的文本总结。现有引文文本生成研究基于差异显著的任务定义,导致难以对该任务进行系统性研究。为应对这一挑战,我们提出CiteBench:一个统一多源异构数据集、支持跨任务设计与跨领域标准化评估的引文文本生成基准测试。借助该基准,我们探究了多个强基准模型的性能表现,测试了其在数据集间的可迁移性,并从任务定义与评估维度提供了新见解,以指导引文文本生成的未来研究。CiteBench代码已在https://github.com/UKPLab/citebench 公开发布。