SciConNav: Knowledge navigation through contextual learning of extensive scientific research trajectories

New knowledge builds upon existing foundations, which means an interdependent relationship exists between knowledge, manifested in the historical development of the scientific system for hundreds of years. By leveraging natural language processing techniques, this study introduces the Scientific Concept Navigator (SciConNav), an embedding-based navigation model to infer the "knowledge pathway" from the research trajectories of millions of scholars. We validate that the learned representations effectively delineate disciplinary boundaries and capture the intricate relationships between diverse concepts. The utility of the inferred navigation space is showcased through multiple applications. Firstly, we demonstrated the multi-step analogy inferences within the knowledge space and the interconnectivity between concepts in different disciplines. Secondly, we formulated the attribute dimensions of knowledge across domains, observing the distributional shifts in the arrangement of 19 disciplines along these conceptual dimensions, including "Theoretical" to "Applied", and "Chemical" to "Biomedical', highlighting the evolution of functional attributes within knowledge domains. Lastly, by analyzing the high-dimensional knowledge network structure, we found that knowledge connects with shorter global pathways, and interdisciplinary knowledge plays a critical role in the accessibility of the global knowledge network. Our framework offers a novel approach to mining knowledge inheritance pathways in extensive scientific literature, which is of great significance for understanding scientific progression patterns, tailoring scientific learning trajectories, and accelerating scientific progress.

翻译：新知识建立在现有基础之上，这意味着知识间存在着相互依存的关系，这种关系体现在数百年来科学体系的历史发展中。本研究利用自然语言处理技术，提出了基于嵌入的导航模型——科学概念导航器（SciConNav），通过数百万学者的研究轨迹推断“知识路径”。我们验证了学习到的表征能够有效划分学科边界，并捕捉不同概念间复杂的关联关系。通过多项应用展示了推断出的导航空间的实用性。首先，我们展示了知识空间内的多步类比推理以及不同学科概念间的相互关联性。其次，我们构建了跨领域知识的属性维度，观察到19个学科沿“理论性”到“应用性”、“化学性”到“生物医学性”等概念维度的分布变化，揭示了知识领域内功能属性的演变规律。最后，通过分析高维知识网络结构，我们发现知识通过更短的全局路径相互连接，且跨学科知识在全球知识网络的可达性中发挥着关键作用。本框架为挖掘大规模科学文献中的知识传承路径提供了新方法，对理解科学发展规律、定制科学学习路径以及加速科学进步具有重要意义。