ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation

Existing Machine Learning approaches for local citation recommendation directly map or translate a query, which is typically a claim or an entity mention, to citation-worthy research papers. Within such a formulation, it is challenging to pinpoint why one should cite a specific research paper for a particular query, leading to limited recommendation interpretability. To alleviate this, we introduce the evidence-grounded local citation recommendation task, where the target latent space comprises evidence spans for recommending specific papers. Using a distantly-supervised evidence retrieval and multi-step re-ranking framework, our proposed system, ILCiteR, recommends papers to cite for a query grounded on similar evidence spans extracted from the existing research literature. Unlike past formulations that simply output recommendations, ILCiteR retrieves ranked lists of evidence span and recommended paper pairs. Secondly, previously proposed neural models for citation recommendation require expensive training on massive labeled data, ideally after every significant update to the pool of candidate papers. In contrast, ILCiteR relies solely on distant supervision from a dynamic evidence database and pre-trained Transformer-based Language Models without any model training. We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.

翻译：现有的机器学习方法在进行局部引文推荐时，直接将查询（通常为某个论断或实体提及）映射或翻译为具有引用价值的研究论文。在这种范式下，难以解释为何针对特定查询应引用某篇特定论文，导致推荐的可解释性受限。为解决此问题，我们提出基于证据的局部引文推荐任务，其目标潜在空间由推荐特定论文所需的证据片段构成。通过远程监督证据检索与多步重排序框架，我们构建的系统ILCiteR能够基于从现有研究文献中提取的相似证据片段，为查询推荐可引用的论文。与以往仅输出推荐结果的方案不同，ILCiteR可检索出证据片段与推荐论文的排序列表。此外，既往神经引文推荐模型需要在大规模标注数据上进行昂贵训练，且每次候选论文库发生重大更新后均需重新训练。相比之下，ILCiteR仅依赖动态证据数据库的远程监督信号与预训练Transformer语言模型，无需任何模型训练。我们为基于证据的局部引文推荐任务贡献了新数据集，并验证了所提出的条件神经排序集成方法在证据片段重排序中的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日