Many real-world networks exhibit hierarchical, tree-like structure and heavy-tailed degree distributions, phenomena not readily captured by standard statistical models for network data. Extensions of the popular continuous latent space modeling framework have been proposed to accommodate such networks. Drawing on insights from statistical physics, continuous latent space models with underlying hyperbolic geometry have been proposed as a natural framework, probabilistically embedding nodes in a latent Riemannian manifold with constant negative curvature. Most statistical implementations, however, simplify the original physics-based model by omitting the ``temperature parameter," which controls the sharpness of the latent distance-to-probability mapping. We argue this omission is critical. We demonstrate that temperature is the fundamental parameter governing a network's tree-like topology, and that failing to infer it weakens model expressiveness. We formalize a Bayesian hyperbolic continuous latent space model with an unknown, learnable temperature parameter. We then develop two inferential procedures: a Hamiltonian Monte Carlo approach for rigorous posterior characterization and a scalable auto-encoding variational Bayes algorithm for large-scale networks. Through simulation and real data examples, we show that our model outperforms models with fixed temperature and misspecified Euclidean geometries in graph reconstruction tasks in most settings, confirming temperature is a crucial and inferable feature of complex networks.
翻译:许多真实网络呈现出层级化、树状结构及重尾度分布,而标准网络数据统计模型难以捕捉这些现象。为此,研究人员提出了对经典连续潜在空间建模框架的扩展。借鉴统计物理学洞见,具有双曲几何背景的连续潜在空间模型被作为自然框架提出,通过将节点概率性地嵌入具有恒定负曲率的黎曼流形中实现建模。然而,多数统计实现忽略了控制潜在距离-概率映射锐度的"温度参数",从而简化了原始物理模型。我们认为这一省略至关重要。研究表明温度是支配网络树状拓扑的基本参数,未能推断该参数将削弱模型表现力。我们形式化了具有未知且可学习的温度参数的贝叶斯双曲连续潜在空间模型,并开发了两种推断方法:用于严格后验表征的哈密顿蒙特卡洛方法,以及适用于大规模网络的可扩展自编码变分贝叶斯算法。通过模拟与真实数据实验,我们证实该模型在多数场景下优于固定温度参数及错误指定的欧几里得几何模型,从而证明温度是复杂网络中关键且可推断的特征。