We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.
翻译:我们提出了嵌套随机分块模型(NSBM),用于对一组网络进行聚类,同时检测每个网络内部的社区结构。NSBM具有若干吸引人的特性:能够处理节点集合可能不同的未标记网络,灵活建模异质社区,以及自动选择网络类别数和每个网络内社区数的能力。这通过贝叶斯模型实现,其中创新性地应用嵌套狄利克雷过程(NDP)作为先验,以联合建模网络间与网络内的聚类结构。网络数据引入的依赖性给NDP带来了显著挑战,特别是在高效采样器的开发方面。对于后验推断,我们提出了多种马尔可夫链蒙特卡洛算法,包括标准吉布斯采样器、折叠吉布斯采样器以及两种分块吉布斯采样器,最终返回网络内外两个层次的聚类标签。大量仿真研究表明,该模型能够非常准确地估计聚类结构的两个层次。我们还将其应用于两个社交网络数据集,由于节点匿名性和各网络节点数不同,此类数据无法通过已有文献的任何方法进行分析。