Citation recommendation systems have attracted much academic interest, resulting in many studies and implementations. These systems help authors automatically generate proper citations by suggesting relevant references based on the text they have written. However, the methods used in citation recommendation differ across various studies and implementations. Some approaches focus on the overall content of papers, while others consider the context of the citation text. Additionally, the datasets used in these studies include different aspects of papers, such as metadata, citation context, or even the full text of the paper in various formats and structures. The diversity in models, datasets, and evaluation metrics makes it challenging to assess and compare citation recommendation methods effectively. To address this issue, a standardized dataset and evaluation metrics are needed to evaluate these models consistently. Therefore, we propose developing a benchmark specifically designed to analyze and compare citation recommendation models. This benchmark will evaluate the performance of models on different features of the citation context and provide a comprehensive evaluation of the models across all these tasks, presenting the results in a standardized way. By creating a benchmark with standardized evaluation metrics, researchers and practitioners in the field of citation recommendation will have a common platform to assess and compare different models. This will enable meaningful comparisons and help identify promising approaches for further research and development in the field.
翻译:引文推荐系统已引起广泛的学术关注,催生了大量研究与实现。这类系统通过根据作者已撰写的文本推荐相关参考文献,帮助作者自动生成恰当的引文。然而,不同研究与实现中采用的引文推荐方法存在差异:部分方法关注论文的整体内容,而另一些则考虑引文文本的上下文。此外,相关研究使用的数据集涵盖了论文的不同方面,如元数据、引文上下文,甚至不同格式与结构的全文。模型、数据集与评估指标的多样性使得有效评估与比较引文推荐方法面临挑战。为解决此问题,需要标准化的数据集与评估指标以对这些模型进行一致性评估。为此,我们提出开发专门用于分析与比较引文推荐模型的基准。该基准将评估模型在引文上下文不同特征上的性能,并对模型在所有任务上进行综合评估,以标准化方式呈现结果。通过建立具有标准化评估指标的基准,引文推荐领域的研究者与实践者将获得评估与比较不同模型的共同平台。这将实现有意义的比较,并有助于识别具有潜力的研究方向,推动该领域的进一步发展。