Systematic literature reviews (SLRs) play an essential role in summarising, synthesising and validating scientific evidence. In recent years, there has been a growing interest in using machine learning techniques to automate the identification of relevant studies for SLRs. However, the lack of standardised evaluation datasets makes comparing the performance of such automated literature screening systems difficult. In this paper, we analyse the citation screening evaluation datasets, revealing that many of the available datasets are either too small, suffer from data leakage or have limited applicability to systems treating automated literature screening as a classification task, as opposed to, for example, a retrieval or question-answering task. To address these challenges, we introduce CSMeD, a meta-dataset consolidating nine publicly released collections, providing unified access to 325 SLRs from the fields of medicine and computer science. CSMeD serves as a comprehensive resource for training and evaluating the performance of automated citation screening models. Additionally, we introduce CSMeD-FT, a new dataset designed explicitly for evaluating the full text publication screening task. To demonstrate the utility of CSMeD, we conduct experiments and establish baselines on new datasets.
翻译:系统文献综述在总结、综合和验证科学证据中起着至关重要的作用。近年来,利用机器学习技术自动化识别系统文献综述相关研究的兴趣日益增长。然而,标准化评估数据集的缺失使得比较此类自动文献筛选系统的性能变得困难。本文分析了引文筛选评估数据集,发现许多现有数据集存在规模过小、数据泄露或仅适用于将自动文献筛选视为分类任务(而非检索或问答任务)的局限性。为应对这些挑战,我们提出CSMeD——一个整合九个公开数据集的元数据集,提供对325项医学与计算机科学领域系统文献综述的统一访问。CSMeD作为训练和评估自动引文筛选模型性能的综合资源。此外,我们引入CSMeD-FT,一个专门用于评估全文出版物筛选任务的新数据集。为展示CSMeD的实用性,我们开展实验并在新数据集上建立基线。