Spectral clustering is a widely used method for community detection in networks. We focus on a semi-supervised community detection scenario in the Partially Labeled Stochastic Block Model (PL-SBM) with two balanced communities, where a fixed portion of labels is known. Our approach leverages random walks in which the revealed nodes in each community act as absorbing states. By analyzing the quasi-stationary distributions associated with these random walks, we construct a classifier that distinguishes the two communities by examining differences in the associated eigenvectors. We establish upper and lower bounds on the error rate for a broad class of quasi-stationary algorithms, encompassing both spectral and voting-based approaches. In particular, we prove that this class of algorithms can achieve the optimal error rate in the connected regime. We further demonstrate empirically that our quasi-stationary approach improves performance on both real-world and simulated datasets.
翻译:谱聚类是网络中社区检测的广泛使用方法。本文研究在具有两个平衡社区的部分标注随机块模型(PL-SBM)中的半监督社区检测场景,其中已知固定比例的节点标签。我们的方法利用随机游走,其中每个社区中已揭示的节点作为吸收状态。通过分析与这些随机游走相关的准平稳分布,我们构建了一个分类器,通过检查相关特征向量的差异来区分两个社区。我们为一大类准平稳算法(包括基于谱方法和基于投票的方法)建立了错误率的上界和下界。特别地,我们证明了该类算法在连通区域能够达到最优错误率。我们进一步通过实验证明,我们的准平稳方法在真实世界和模拟数据集上均能提升性能。