The paper tackles the problem of clustering multiple networks, directed or not, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of clusters requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the networks. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a set of ecological networks illustrates the interpretability of the obtained results.
翻译:本文研究了将多个有向或无向、且顶点集不完全相同的网络,按拓扑相似性聚为若干组的问题。提出一种基于随机块模型有限混合的统计模型方法,通过最大化集成分类似然准则实现聚类。该过程采用层次凝聚算法,从单一网络簇开始,逐步合并网络簇,从而生成嵌套聚类序列,可用树状图直观展示网络集合的层次结构。基于贝叶斯框架,算法可自动完成模型选择——当达到最优簇数时自动停止。在合理实现下,该算法计算高效。簇的合并需要解决随机块模型的标签置换问题,并匹配网络间的块标签。为此,本文提出一种基于图例比较的新方法。基于合成数据验证了聚类方法的有效性,并在生态网络数据集上展示了结果的可解释性。