Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing

from arxiv, 19 pages, 5 figures, 6 tables. Code and manuscript sources: https://github.com/wazaahhh/breakthroughs-diffusion . An earlier version was presented at the Global Tech Mining Conference (GTM) 2026 (submission #117)

Understanding and anticipating scientific change requires models that distinguish between endogenous consolidation and exogenous diffusion of scientific concepts. Using the quantum computing subtree of concepts in OpenAlex, we construct a temporally resolved concept co-occurrence network and track each concept pair through its upstream citation lineage and downstream diffusion. We train LightGBM models on distributional and diversity-aware features to predict four outcomes: endogenous reinforcement, exogenous diffusion, their ratio, and diffusion entropy. After controlling for overall publication growth of the scientific body, endogenous reinforcement proves largely unpredictable in the primary quantum-computing benchmark. In contrast, exogenous diffusion and entropy are strongly predictable ($R^2$ up to $0.78à) and are driven by upstream heterogeneity, citation breadth, and distributional dispersion, as shown by SHAP analyses; replications on robotics, advanced materials, and neuro implants confirm that exogenous diffusion remains the top-ranked target across fields ($R^2_test \sim 0.60-0.87$), while endogenous predictability rises markedly in neuro implants (R^2_test = 0.83), indicating that the quantum-computing asymmetry does not generalise uniformly. Case studies reveal that sharp entropy increases coincide with the opening of new conceptual frontiers, while entropy collapses signal technological convergence or paradigm displacement. These results demonstrate that conceptual diffusion is governed by stable structural regularities embedded in semantic and citation environments. By identifying early diversity-based signals of cross-domain uptake, the approach provides a scalable foundation for anticipatory scientometrics, technology foresight, and innovation-oriented policy analysis in rapidly evolving research fields.

翻译：理解并预测科学变革需要能区分科学概念的内源性巩固与外源性扩散的模型。利用OpenAlex中量子计算概念子树，我们构建了时间分辨的概念共现网络，并追踪每个概念对的上游引文谱系与下游扩散过程。基于分布特征和多样性特征训练LightGBM模型，预测四类结果：内源性巩固、外源性扩散、两者比率及扩散熵。在控制科学文献整体出版增长后，量子计算基准实验表明内源性巩固几乎无法预测。相比之下，外源性扩散及其熵具有强可预测性（$R^2$高达0.78），且受上游异质性、引文广度和分布离散度驱动（SHAP分析验证）。在机器人学、先进材料及神经植入物领域的重复实验证实：外源性扩散仍是跨领域排名首位的预测目标（$R^2_{test} \sim 0.60-0.87$），而神经植入物中内源性可预测性显著提升（$R^2_{test}=0.83$），表明量子计算中的非对称性并非普遍规律。案例研究表明，熵的急剧上升与新概念前沿的开启同步，而熵的坍缩则预示技术收敛或范式更迭。这些结果证明，概念扩散受嵌入语义与引文环境的稳定结构规律支配。通过识别跨领域采纳的早期多样性信号，该方法为快速演进研究领域的预见性科学计量学、技术前瞻及创新导向政策分析提供了可扩展基础。