Introduction: Tracing the spread of ideas and the presence of influence is a question of special importance across a wide range of disciplines, ranging from intellectual history to cultural analytics, computational social science, and the science of science. Method: We collect a corpus of open source journal articles, generate Knowledge Graph representations using the Gemini LLM, and attempt to predict the existence of citations between sampled pairs of articles using previously published methods and a novel Graph Neural Network based embedding model. Results: We demonstrate that our knowledge graph embedding method is superior at distinguishing pairs of articles with and without citation. Once trained, it runs efficiently and can be fine-tuned on specific corpora to suit individual researcher needs. Conclusion(s): This experiment demonstrates that the relationships encoded in a knowledge graph, especially the types of concepts brought together by specific relations can encode information capable of revealing intellectual influence. This suggests that further work in analyzing document level knowledge graphs to understand latent structures could provide valuable insights.
翻译:引言:追踪思想的传播与影响力的存在,是横跨思想史、文化分析、计算社会科学及科学学等多个学科领域的一个至关重要的问题。方法:我们收集了一个开源期刊论文语料库,利用Gemini大语言模型生成知识图谱表示,并尝试使用先前发表的方法以及一种新颖的基于图神经网络(GNN)的嵌入模型,来预测抽样论文对之间是否存在引用关系。结果:我们证明,我们的知识图谱嵌入方法在区分有引用和无引用的论文对方面表现更优。模型一旦训练完成,即可高效运行,并可在特定语料库上进行微调,以满足研究者的个性化需求。结论:本实验表明,知识图谱中编码的关系,尤其是特定关系所关联的概念类型,能够编码揭示学术影响力的信息。这表明,进一步分析文档级知识图谱以理解其潜在结构,可能提供有价值的洞见。