The paper tackles the problem of clustering multiple networks, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of groups of networks requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the graphs. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a collection of ecological networks illustrates the interpretability of the obtained results.
翻译:本文针对多个不共享相同顶点集的网络,将其聚类为具有相似拓扑结构的网络组的问题进行了研究。提出了一种基于随机分块模型有限混合的统计模型方法。通过最大化集成分类似然准则来获得聚类结果。这通过一种层次凝聚算法实现,该算法从单元素簇开始,逐步合并网络簇。由此,计算出一系列嵌套聚类,这些聚类可通过树状图表示,为网络集合提供有价值的洞察。利用贝叶斯框架,模型选择以自动化方式完成,算法在达到最佳聚类数目时自动停止。若精心实现,该算法在计算上高效。网络组的聚合需要一种方法来克服随机分块模型的标签切换问题,并匹配图的块标签。为解决此问题,提出了一种基于比较相关随机分块模型图的新工具。该聚类方法在合成数据上进行了评估。对一组生态网络的应用展示了所得结果的可解释性。