The stochastic block model (SBM) is a random graph model with different group of vertices connecting differently. It is widely employed as a canonical model to study clustering and community detection, and provides a fertile ground to study the information-theoretic and computational tradeoffs that arise in combinatorial statistics and more generally data science. This monograph surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational tradeoffs, and for various recovery requirements such as exact, partial and weak recovery. The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal SNR-mutual information tradeoff for partial recovery, and the gap between information-theoretic and computational thresholds. The monograph gives a principled derivation of the main algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, (linearized) belief propagation, classical/nonbacktracking spectral methods and graph powering. Extensions to other block models, such as geometric block models, and a few open problems are also discussed.
翻译:随机块模型(SBM)是一种顶点分组连接方式不同的随机图模型。它被广泛用作研究聚类和社区检测的经典模型,为研究组合统计学乃至更广泛数据科学中出现的信息论与计算权衡提供了肥沃土壤。本专题综述总结了近期在SBM社区检测中建立基本极限的研究进展,既涵盖信息论与计算权衡,也涉及精确恢复、部分恢复和弱恢复等不同恢复要求。主要讨论的结果包括:Chernoff-Hellinger阈值处的精确恢复相变、Kesten-Stigum阈值处的弱恢复相变、部分恢复的最优信噪比-互信息权衡,以及信息论阈值与计算阈值之间的差距。本综述系统推导了为实现这些极限而开发的主要算法,特别是基于图分裂的两轮算法、半定规划、(线性化)置信传播、经典/非回溯谱方法和图幂运算。还讨论了向其他块模型(如几何块模型)的扩展以及若干开放问题。