Social networks are often associated with rich side information, such as texts and images. While numerous methods have been developed to identify communities from pairwise interactions, they usually ignore such side information. In this work, we study an extension of the Stochastic Block Model (SBM), a widely used statistical framework for community detection, that integrates vectorial edges covariates: the Vectorial Edges Covariates Stochastic Block Model (VEC-SBM). We propose a novel algorithm based on iterative refinement techniques and show that it optimally recovers the latent communities under the VEC-SBM. Furthermore, we rigorously assess the added value of leveraging edge's side information in the community detection process. We complement our theoretical results with numerical experiments on synthetic and semi-synthetic data.
翻译:社交网络常伴随丰富的辅助信息,如文本和图像。尽管已有众多方法从成对交互中识别社区,但它们通常忽略了这类辅助信息。本文研究了一种扩展的随机块模型(SBM)——向量化边协变量随机块模型(VEC-SBM),该模型整合了向量化边协变量,是社区检测中广泛使用的统计框架。我们提出了一种基于迭代优化技术的新算法,并证明该算法能在VEC-SBM下最优地恢复潜在社区。此外,我们严格评估了在社区检测过程中利用边辅助信息的附加价值。我们通过合成数据与半合成数据的数值实验补充了理论结果。