Systematic reviews are the standard method for synthesizing scientific evidence, but their creation requires substantial manual effort, particularly during retrieval and screening. While recent work has explored automating these steps, evaluation resources remain largely confined to the biomedical domain, limiting reproducible experimentation in other domains. This paper introduces SR4CS, a large-scale collection of systematic reviews in computer science, designed to support reproducible research on Boolean query generation, retrieval, and screening. The corpus comprises 1,212 systematic reviews with their original expert-designed Boolean search queries, 104,316 resolved references, and structured methodological metadata. For controlled evaluation, the original Boolean queries are additionally provided in a normalized, approximated form operating over titles and abstracts. To illustrate the intended use of the collection, baseline experiments compare the approximated expert Boolean queries with zero-shot LLM-generated Boolean queries, BM25, and dense retrieval under a unified evaluation setting. The results highlight systematic differences in precision, recall, and ranking behavior across retrieval paradigms and expose limitations of naive zero-shot Boolean generation. SR4CS is released under an open license on Zenodo (https://doi.org/10.5281/zenodo.17163932), together with documentation and code (https://github.com/webis-de/scolia26-sr4cs), to enable reproducible evaluation and future research on scaling systematic review automation.
翻译:系统综述是综合科学证据的标准方法,但其创建过程需要大量人工劳动,尤其是在检索和筛选阶段。尽管近期研究已探索自动化这些步骤,但评估资源仍主要局限于生物医学领域,限制了其他领域的可复现实验。本文提出SR4CS——一个面向计算机科学领域的大规模系统综述数据集,旨在支持布尔查询生成、检索和筛选的可复现研究。该语料库包含1,212篇系统综述及其原始专家设计的布尔搜索查询、104,316条已解析参考文献以及结构化方法元数据。为进行受控评估,原始布尔查询还额外以规范化近似形式提供,该形式基于标题与摘要进行运算。为展示该数据集的预期用途,我们开展基线实验,在统一评估设置下比较近似专家布尔查询与零样本大型语言模型生成的布尔查询、BM25及密集检索的性能。结果揭示了不同检索范式在精确率、召回率和排序行为上的系统性差异,并暴露了朴素零样本布尔生成的局限性。SR4CS已在Zenodo平台(https://doi.org/10.5281/zenodo.17163932)以开放许可协议发布,同时提供文档和代码(https://github.com/webis-de/scolia26-sr4cs),以支持可复现评估及系统综述自动化规模的未来研究。