The increasing availability of multiple network data has highlighted the need for statistical models for heterogeneous populations of networks. A convenient framework makes use of metrics to measure similarity between networks. In this context, we propose a novel Bayesian nonparametric model that identifies clusters of networks characterized by similar connectivity patterns. Our approach relies on a location-scale Dirichlet process mixture of centered Erdős--Rényi kernels, with components parametrized by a unique network representative, or mode, and a univariate measure of dispersion around the mode. We demonstrate that this model has full support in the Kullback--Leibler sense and is strongly consistent. An efficient Markov chain Monte Carlo scheme is proposed for posterior inference and clustering of multiple network data. The performance of the model is validated through extensive simulation studies, showing improvements over state-of-the-art methods. Additionally, we present an effective strategy to extend the application of the proposed model to datasets with a large number of nodes. We illustrate our approach with the analysis of human brain network data.
翻译:随着多网络数据可用性的日益增长,对异构网络群体进行统计建模的需求日益凸显。一种便捷的框架利用度量来衡量网络间的相似性。在此背景下,我们提出了一种新颖的贝叶斯非参数模型,该模型能够识别具有相似连接模式的网络簇。我们的方法依赖于以中心化Erdős–Rényi核为基础的位置-尺度狄利克雷过程混合模型,其组分由唯一的网络代表(或称众数)以及一个描述围绕该众数离散程度的单变量度量所参数化。我们证明了该模型在Kullback–Leibler意义下具有完全支撑性,并且是强一致的。我们提出了一种高效的马尔可夫链蒙特卡洛方案,用于后验推断和多网络数据的聚类。通过广泛的模拟研究验证了该模型的性能,结果显示其优于现有先进方法。此外,我们提出了一种有效的策略,将所提模型的应用扩展到具有大量节点的数据集。我们通过分析人脑网络数据来阐述我们的方法。