The contextual stochastic block model (cSBM) was proposed for unsupervised community detection on attributed graphs where both the graph and the high-dimensional node information correlate with node labels. In the context of machine learning on graphs, the cSBM has been widely used as a synthetic dataset for evaluating the performance of graph-neural networks (GNNs) for semi-supervised node classification. We consider a probabilistic Bayes-optimal formulation of the inference problem and we derive a belief-propagation-based algorithm for the semi-supervised cSBM; we conjecture it is optimal in the considered setting and we provide its implementation. We show that there can be a considerable gap between the accuracy reached by this algorithm and the performance of the GNN architectures proposed in the literature. This suggests that the cSBM, along with the comparison to the performance of the optimal algorithm, readily accessible via our implementation, can be instrumental in the development of more performant GNN architectures.
翻译:上下文随机块模型(cSBM)最初被提出用于属性图上的无监督社区检测,其中图结构和高维节点信息均与节点标签相关。在图机器学习背景下,cSBM 已被广泛用作合成数据集,以评估图神经网络(GNN)在半监督节点分类任务中的性能。我们考虑了该推断问题的概率贝叶斯最优形式,并推导出一种基于置信传播的算法用于半监督 cSBM;我们猜想该算法在所考虑的场景中是最优的,并提供了其实现。研究表明,该算法达到的准确率与文献中提出的 GNN 架构性能之间存在显著差距。这表明,cSBM 以及与通过我们实现可轻松获取的最优算法性能的对比,有助于开发性能更优的 GNN 架构。