We propose a method to measure the similarity of papers and authors by simulating a literature search procedure on citation networks, which is an information retrieval inspired conceptualization of similarity. This transition probability (TP) based approach does not require a curated classification system, avoids clustering complications, and provides a continuous measure of similarity. We perform testing scenarios to explore several versions of the general TP concept and the Node2vec machine-learning technique. We found that TP measures outperform Node2vec in mapping the macroscopic structure of fields. The paper provides a general discussion of how to implement TP similarity measurement, with a particular focus on how to utilize publication-level information to approximate the research interest similarity of individual scientists. This paper is accompanied by a Python package capable of calculating all the tested metrics.
翻译:我们提出一种通过模拟引文网络上的文献检索过程来度量论文与作者相似性的方法,这是一种受信息检索启发的相似性概念化框架。这种基于转移概率的方法无需依赖人工构建的分类体系,避免了聚类分析的复杂性,并能提供连续型相似性度量。我们通过多组测试场景探究了通用TP概念的若干变体及Node2vec机器学习技术。研究发现,在刻画学科宏观结构方面,TP度量方法优于Node2vec。本文系统探讨了TP相似性度量的实施方案,特别关注如何利用出版物层级信息来近似测算科研人员的个体研究兴趣相似性。本文同步发布了可计算所有测试指标的Python软件包。