The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to stochastic model of networks and semidefinite program research.
翻译:随机块模型是基于网络结构数据进行聚类与社区检测的经典随机图模型。数十年来对该问题的深入研究已确立诸多深刻结论,其中Kesten-Stigum阈值处的相变现象从数学与应用角度均引人关注。该阈值表明:当模型参数低于特定阈值时,在稀疏图上基于网络拓扑结构的任何估计器都无法显著优于随机猜测。然而,若将研究视野稍加扩展至普遍存在的半监督场景,这一根本性局限将完全消失。我们证明:当任意比例的标签被揭示时,检测问题在整个参数域内均具有可行性。此外,我们提出了两种高效算法——一种基于组合方法,另一种基于优化方法——用于将标签信息与图结构进行整合。本研究为网络随机模型及半定规划研究带来了全新视角。