Latent space models (LSMs) are frequently used to model network data by embedding a network's nodes into a low-dimensional latent space; however, choosing the dimension of this space remains a challenge. To this end, we begin by formalizing a class of LSMs we call generalized linear network eigenmodels (GLNEMs) that can model various edge types (binary, ordinal, non-negative continuous) found in scientific applications. This model class subsumes the traditional eigenmodel by embedding it in a generalized linear model with an exponential dispersion family random component and fixes identifiability issues that hindered interpretability. Next, we propose a Bayesian approach to dimension selection for GLNEMs based on an ordered spike-and-slab prior that provides improved dimension estimation and satisfies several appealing theoretical properties. In particular, we show that the model's posterior concentrates on low-dimensional models near the truth. We demonstrate our approach's consistent dimension selection on simulated networks. Lastly, we use GLNEMs to study the effect of covariates on the formation of networks from biology, ecology, and economics and the existence of residual latent structure.
翻译:潜变量空间模型(LSMs)常通过将网络节点嵌入低维潜变量空间来建模网络数据,但该空间维度的选择仍具挑战性。为此,我们首先形式化了一类称为广义线性网络特征模型(GLNEMs)的LSMs,能够处理科学应用中出现的多种边类型(二元、有序、非负连续)。该模型类通过将传统特征模型嵌入具有指数分散族随机分量的广义线性模型,同时解决了阻碍可解释性的可辨识性问题。其次,我们提出了一种基于有序尖峰-平板先验的贝叶斯维度选择方法,该方法能提供更优的维度估计并满足若干理想理论性质。特别地,我们证明了该模型的后验分布会集中于接近真实情况的低维模型。通过在模拟网络上的实验验证了该方法的一致性维度选择能力。最后,我们运用GLNEMs研究了协变量对生物学、生态学与经济学领域网络形成的影响,并分析了残差潜在结构的存在性。