Proper citation of relevant literature is essential for contextualising and validating scientific contributions. While current citation recommendation systems leverage local and global textual information, they often overlook the nuances of the human citation behaviour. Recent methods that incorporate such patterns improve performance but incur high computational costs and introduce systematic biases into downstream rerankers. To address this, we propose Profiler, a lightweight, non-learnable module that captures human citation patterns efficiently and without bias, significantly enhancing candidate retrieval. Furthermore, we identify a critical limitation in current evaluation protocol: the systems are assessed in a transductive setting, which fails to reflect real-world scenarios. We introduce a rigorous Inductive evaluation setting that enforces strict temporal constraints, simulating the recommendation of citations for newly authored papers in the wild. Finally, we present DAVINCI, a novel reranking model that integrates profiler-derived confidence priors with semantic information via an adaptive vector-gating mechanism. Our system achieves new state-of-the-art results across multiple benchmark datasets, demonstrating superior efficiency and generalisability.
翻译:对相关文献进行恰当引用对于科学贡献的情境化和验证至关重要。现有的引用推荐系统虽然利用了局部和全局文本信息,却常常忽视人类引用行为的细微差别。近期融入此类模式的方法虽提升了性能,但计算成本高昂,并给下游重排序器引入了系统性偏差。为解决这一问题,我们提出了Profiler——一个轻量级、不可学习的模块,它能够高效且无偏地捕捉人类引用模式,显著提升候选文献的检索效果。此外,我们发现了当前评估方案中的一个关键局限:系统在转导式设置下进行评估,这未能反映真实世界场景。我们引入了一个严格的归纳式评估设置,强制执行严格的时间约束,以模拟为实际新撰写的论文推荐引用的过程。最后,我们提出了DAVINCI——一种新颖的重排序模型,它通过自适应向量门控机制,将Profiler生成的置信度先验与语义信息相结合。我们的系统在多个基准数据集上取得了新的最先进结果,展现出卓越的效率和泛化能力。