Reviewer recommendation is a critical task for enhancing the efficiency of academic publishing workflows. However, research in this area has been persistently hindered by the lack of high-quality benchmark datasets, which are often limited in scale, disciplinary scope, and comparative analyses of different methodologies. To address this gap, we introduce FRONTIER-RevRec, a large-scale dataset constructed from authentic peer review records (2007-2025) from the Frontiers open-access publishing platform https://www.frontiersin.org/. The dataset contains 177941 distinct reviewers and 478379 papers across 209 journals spanning multiple disciplines including clinical medicine, biology, psychology, engineering, and social sciences. Our comprehensive evaluation on this dataset reveals that content-based methods significantly outperform collaborative filtering. This finding is explained by our structural analysis, which uncovers fundamental differences between academic recommendation and commercial domains. Notably, approaches leveraging language models are particularly effective at capturing the semantic alignment between a paper's content and a reviewer's expertise. Furthermore, our experiments identify optimal aggregation strategies to enhance the recommendation pipeline. FRONTIER-RevRec is intended to serve as a comprehensive benchmark to advance research in reviewer recommendation and facilitate the development of more effective academic peer review systems. The FRONTIER-RevRec dataset is available at: https://anonymous.4open.science/r/FRONTIER-RevRec-5D05.
翻译:审稿人推荐是提升学术出版流程效率的关键任务。然而,该领域的研究长期受限于高质量基准数据集的缺乏,这些数据集通常在规模、学科范围以及不同方法的比较分析方面存在不足。为填补这一空白,我们推出了FRONTIER-RevRec,这是一个基于Frontiers开放获取出版平台(https://www.frontiersin.org/)2007年至2025年真实同行评审记录构建的大规模数据集。该数据集包含177,941位独立审稿人和478,379篇论文,涵盖包括临床医学、生物学、心理学、工程学和社会科学在内的209种期刊。我们在此数据集上的综合评估表明,基于内容的方法显著优于协同过滤方法。这一发现可通过我们的结构分析得到解释,该分析揭示了学术推荐与商业领域之间的根本差异。值得注意的是,利用语言模型的方法在捕捉论文内容与审稿人专业知识的语义对齐方面尤为有效。此外,我们的实验确定了优化聚合策略以增强推荐流程的方法。FRONTIER-RevRec旨在作为一个全面的基准,以推动审稿人推荐领域的研究,并促进更有效的学术同行评审系统的开发。FRONTIER-RevRec数据集可通过以下链接获取:https://anonymous.4open.science/r/FRONTIER-RevRec-5D05。