FRONTIER-RevRec: A Large-scale Dataset for Reviewer Recommendation

Reviewer recommendation is a critical task for enhancing the efficiency of academic publishing workflows. However, research in this area has been persistently hindered by the lack of high-quality benchmark datasets, which are often limited in scale, disciplinary scope, and comparative analyses of different methodologies. To address this gap, we introduce FRONTIER-RevRec, a large-scale dataset constructed from authentic peer review records (2007-2025) from the Frontiers open-access publishing platform https://www.frontiersin.org/. The dataset contains 177941 distinct reviewers and 478379 papers across 209 journals spanning multiple disciplines including clinical medicine, biology, psychology, engineering, and social sciences. Our comprehensive evaluation on this dataset reveals that content-based methods significantly outperform collaborative filtering. This finding is explained by our structural analysis, which uncovers fundamental differences between academic recommendation and commercial domains. Notably, approaches leveraging language models are particularly effective at capturing the semantic alignment between a paper's content and a reviewer's expertise. Furthermore, our experiments identify optimal aggregation strategies to enhance the recommendation pipeline. FRONTIER-RevRec is intended to serve as a comprehensive benchmark to advance research in reviewer recommendation and facilitate the development of more effective academic peer review systems. The FRONTIER-RevRec dataset is available at: https://anonymous.4open.science/r/FRONTIER-RevRec-5D05.

翻译：审稿人推荐是提升学术出版流程效率的关键任务。然而，该领域的研究长期受限于高质量基准数据集的缺乏，这些数据集通常在规模、学科范围以及不同方法的比较分析方面存在不足。为填补这一空白，我们推出了FRONTIER-RevRec，这是一个基于Frontiers开放获取出版平台（https://www.frontiersin.org/）2007年至2025年真实同行评审记录构建的大规模数据集。该数据集包含177,941位独立审稿人和478,379篇论文，涵盖包括临床医学、生物学、心理学、工程学和社会科学在内的209种期刊。我们在此数据集上的综合评估表明，基于内容的方法显著优于协同过滤方法。这一发现可通过我们的结构分析得到解释，该分析揭示了学术推荐与商业领域之间的根本差异。值得注意的是，利用语言模型的方法在捕捉论文内容与审稿人专业知识的语义对齐方面尤为有效。此外，我们的实验确定了优化聚合策略以增强推荐流程的方法。FRONTIER-RevRec旨在作为一个全面的基准，以推动审稿人推荐领域的研究，并促进更有效的学术同行评审系统的开发。FRONTIER-RevRec数据集可通过以下链接获取：https://anonymous.4open.science/r/FRONTIER-RevRec-5D05。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日