Existing Machine Learning approaches for local citation recommendation directly map or translate a query, which is typically a claim or an entity mention, to citation-worthy research papers. Within such a formulation, it is challenging to pinpoint why one should cite a specific research paper for a particular query, leading to limited recommendation interpretability. To alleviate this, we introduce the evidence-grounded local citation recommendation task, where the target latent space comprises evidence spans for recommending specific papers. Using a distantly-supervised evidence retrieval and multi-step re-ranking framework, our proposed system, ILCiteR, recommends papers to cite for a query grounded on similar evidence spans extracted from the existing research literature. Unlike past formulations that simply output recommendations, ILCiteR retrieves ranked lists of evidence span and recommended paper pairs. Secondly, previously proposed neural models for citation recommendation require expensive training on massive labeled data, ideally after every significant update to the pool of candidate papers. In contrast, ILCiteR relies solely on distant supervision from a dynamic evidence database and pre-trained Transformer-based Language Models without any model training. We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.
翻译:现有的机器学习方法在进行局部引文推荐时,直接将查询(通常为某个论断或实体提及)映射或翻译为具有引用价值的研究论文。在这种范式下,难以解释为何针对特定查询应引用某篇特定论文,导致推荐的可解释性受限。为解决此问题,我们提出基于证据的局部引文推荐任务,其目标潜在空间由推荐特定论文所需的证据片段构成。通过远程监督证据检索与多步重排序框架,我们构建的系统ILCiteR能够基于从现有研究文献中提取的相似证据片段,为查询推荐可引用的论文。与以往仅输出推荐结果的方案不同,ILCiteR可检索出证据片段与推荐论文的排序列表。此外,既往神经引文推荐模型需要在大规模标注数据上进行昂贵训练,且每次候选论文库发生重大更新后均需重新训练。相比之下,ILCiteR仅依赖动态证据数据库的远程监督信号与预训练Transformer语言模型,无需任何模型训练。我们为基于证据的局部引文推荐任务贡献了新数据集,并验证了所提出的条件神经排序集成方法在证据片段重排序中的有效性。