Low-dimensional representation and clustering of network data are tasks of great interest across various fields. Latent position models are routinely used for this purpose by assuming that each node has a location in a low-dimensional latent space, and enabling node clustering. However, these models fall short in simultaneously determining the optimal latent space dimension and the number of clusters. Here we introduce the latent shrinkage position cluster model (LSPCM), which addresses this limitation. The LSPCM posits a Bayesian nonparametric shrinkage prior on the latent positions' variance parameters resulting in higher dimensions having increasingly smaller variances, aiding in the identification of dimensions with non-negligible variance. Further, the LSPCM assumes the latent positions follow a sparse finite Gaussian mixture model, allowing for automatic inference on the number of clusters related to non-empty mixture components. As a result, the LSPCM simultaneously infers the latent space dimensionality and the number of clusters, eliminating the need to fit and compare multiple models. The performance of the LSPCM is assessed via simulation studies and demonstrated through application to two real Twitter network datasets from sporting and political contexts. Open source software is available to promote widespread use of the LSPCM.
翻译:网络数据的低维表示与聚类是多个领域高度关注的研究任务。潜在位置模型通过假设每个节点在低维潜在空间中具有特定位置,并实现节点聚类,常被用于此目的。然而,这类模型在同时确定最优潜在空间维度和聚类数量方面存在不足。本文提出潜在收缩位置聚类模型(LSPCM),解决了这一局限性。LSPCM对潜在位置方差参数设定贝叶斯非参数收缩先验,使得高维维度具有逐渐减小的方差,有助于识别具有非可忽略方差的维度。此外,LSPCM假设潜在位置服从稀疏有限高斯混合模型,能够根据非空混合成分自动推断聚类数量。因此,LSPCM可同时推断潜在空间维度和聚类数量,无需拟合和比较多个模型。通过模拟研究评估LSPCM的性能,并应用于体育和政治领域的两个真实推特网络数据集进行验证。为促进LSPCM的广泛应用,已提供开源软件。