One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is a widely used model, for which various estimation methods have been developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community detection under SCWA and show that it is community detection consistent. In addition, we investigate model selection in terms of the number of communities and feature selection for the pairwise covariates, and propose two corresponding algorithms. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
翻译:网络研究中一个最基础的问题是社区发现。随机块模型(SBM)是广泛使用的模型,针对该模型已发展了多种估计方法,并揭示了其社区发现的一致性结果。然而,SBM受限于同一社区内所有节点随机等价的强假设,这在实际应用中可能不适用。我们引入了一种成对协变量调整随机块模型(PCABM),这是对SBM的推广,纳入了成对协变量信息。我们研究协变量系数以及社区分配的最大似然估计。结果表明,在适当的稀疏条件下,协变量的系数估计和社区分配均具有一致性。引入带有调整的谱聚类(SCWA)以高效求解PCABM。在特定条件下,我们推导了SCWA下社区发现的误差界,并证明其具有社区发现一致性。此外,我们探讨了关于社区数量的模型选择及成对协变量的特征选择,并提出了两种相应算法。当协变量信息可用时,PCABM在广泛的模拟和真实网络中优于SBM或度校正随机块模型(DCBM)。