Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improvement. In the scenario of peer reviewing, it is a common phenomenon that submissions are identified as missing vital citations by reviewers. This may lead to a negative impact on the credibility and validity of the research presented. To help improve citations of full papers, we first define a novel task of Recommending Missed Citations Identified by Reviewers (RMC) and construct a corresponding expert-labeled dataset called CitationR. We conduct an extensive evaluation of several state-of-the-art methods on CitationR. Furthermore, we propose a new framework RMCNet with an Attentive Reference Encoder module mining the relevance between papers, already-made citations, and missed citations. Empirical results prove that RMC is challenging, with the proposed architecture outperforming previous methods in all metrics. We release our dataset and benchmark models to motivate future research on this challenging new task.
翻译:全面且恰当地引用已成为科学出版物爆炸式增长背景下的一项挑战性任务。当前引文推荐系统旨在为给定文本上下文或草稿论文推荐一系列科学论文。然而,现有工作均未聚焦于已包含在完整论文中的引用——这些引用并不完美,仍有很大改进空间。在同行评审场景中,投稿被评审人指出遗漏关键引用是常见现象,这可能对所述研究的可信度与有效性产生负面影响。为帮助改进完整论文的引用,我们首先定义了一项新任务——推荐审稿人识别的遗漏引用(RMC),并构建了对应的专家标注数据集CitationR。我们在CitationR上对多种前沿方法进行了广泛评估。此外,我们提出了一种新框架RMCNet,该框架包含一个注意力参考编码器模块,用于挖掘论文、已有引用与遗漏引用之间的相关性。实验结果表明,RMC具有挑战性,而所提出的架构在所有指标上均优于先前方法。我们公开了数据集与基准模型,以激励这一新兴挑战任务的未来研究。