Latent space models for network data characterize each node through a vector of latent features whose pairwise similarities define the edge probabilities among the pairs of nodes. Although this formulation has led to successful implementations, the overarching focus has been on directly inferring node embeddings through the latent features, rather than learning the generative process underlying these embeddings. This focus prevents borrowing information across the node features and limits the ability to infer higher-level architectures governing network formation. For example, routinely-studied networks often exhibit multiscale structures informing on nested modular hierarchies among nodes, which could be learned via tree-based representations of dependencies among the latent features. We pursue this direction by bridging latent variable representations of network data with concepts from phylogenetic inference to design a novel latent space model that explicitly characterizes the generative process of the node feature vectors through a branching Brownian motion, with branching structure parametrized by a tree. This tree constitutes the main object of interest and is learned under a Bayesian perspective leveraging priors inherited from phylogenetic literature to infer tree-based modular hierarchies across nodes, which explain heterogeneous multiscale patterns in the network. Identifiability results are derived along with posterior consistency theory. The inference potentials of our model are illustrated in simulations and two real-data applications from criminology and neuroscience, where our formulation learns core structures hidden to state-of-the-art alternatives.
翻译:网络数据的潜空间模型通过潜在特征向量刻画每个节点,节点对间的特征相似性决定了边的形成概率。尽管该框架已取得成功应用,但现有研究主要聚焦于直接通过潜在特征推断节点嵌入,而非学习这些嵌入背后的生成过程。这种局限性阻碍了节点特征间信息的跨模块共享,并限制了推断支配网络形成的高层级架构的能力。例如,常规研究的网络常呈现多尺度结构,反映节点间嵌套的模块层级关系,这种结构可通过树形表达的潜在特征依赖关系加以学习。我们通过将网络数据的潜变量表示与系统发育推断概念相结合,设计了一种新型潜空间模型——该模型利用带分支结构的布朗运动显式刻画节点特征向量的生成过程,其中分支结构由树形参数化。这棵树作为核心研究对象,在贝叶斯框架下借助源自系统发育文献的先验知识进行学习,从而推断网络中基于树的节点模块层级结构,进而解释网络中的异质性多尺度模式。本文推导了模型的可识别性结果与后验一致性理论。通过仿真实验及犯罪学与神经科学领域的两个实际数据应用,我们验证了该模型的推断潜力——相较于现有先进方法,本文模型能够学习到被其掩盖的核心网络结构。