Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as the data are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we propose a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance as indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.
翻译:开放同行评审是学术出版中日益增长的趋势。公开同行评审数据既能惠及学术界与出版界,也为评审意见生成研究乃至自动化论文评审的实现提供了重要支撑。然而,现有同行评审数据集大多未能覆盖完整的同行评审流程。此外,由于数据主要来源于计算机科学领域,其多样性也较为不足。当前同行评审数据集的这两项局限亟需解决,以释放相关研究的更多可能性。为此,我们构建了多学科开放同行评审数据集MOPRD。该数据集包含论文元数据、多版本手稿、评审意见、元评审、作者反驳函及编辑决策。同时,我们基于MOPRD提出了一种模块化引导的评审意见生成方法。实验表明,该方法在自动指标与人工评估中均表现出更优性能。我们还探索了MOPRD的其他潜在应用,包括元评审生成、编辑决策预测、作者反驳生成及科学计量分析。MOPRD为同行评审相关研究及其他应用的深入发展提供了有力支撑。