The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; we call it the neural-prior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We analyze the performance of the algorithm as well as the Bayes-optimal performance. We identify detectability and exact recovery phase transitions, as well as an algorithmically hard region. The proposed model and algorithm can be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of simple graph neural networks.
翻译:随机块模型(SBM)作为图聚类(即社区检测)的基准模型被广泛研究。在实际应用中,图数据常附带包含社区额外信息的节点属性。以往工作通过假设节点属性由节点社区成员关系生成来对此类数据进行建模。受近期以深度神经网络为先验的信号处理研究热潮启发,本研究提出将社区视为由节点属性决定,而非相反。我们定义了相应模型,并将其命名为神经先验SBM。基于统计物理学原理,我们提出了一种结合置信度传播与近似消息传递的算法,分析了该算法性能及贝叶斯最优性能。我们识别出可检测性与精确恢复的相变过程,以及算法上的困难区域。所提出的模型与算法可作为理论与算法的基准。为验证此观点,我们将最优性能与简单图神经网络的性能进行了对比。