Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.
翻译:社区检测是网络分析中的一项关键任务,融入个体层面的信息(即协变量)可显著提升其性能。然而,现有方法在调整参数选择及低度节点分析方面常面临挑战。本文提出一种新颖方法,通过构建网络调整协变量来解决上述难题:该方法利用网络连接与协变量,并依据节点度数为每个节点赋予独特权重。在网络调整协变量上应用谱聚类,可在特定条件下实现社区标签的精确恢复,且此过程无需调参、计算高效。我们提出了关于该方法的崭新理论结果:在含协变量的度校正随机块模型中,即便存在模型误设及度有界稀疏社区,该方法仍能保持强一致性。此外,我们建立了同时存在网络与协变量时社区检测问题的通用下界,证明我们的方法在常数因子范围内达到最优。模拟实验及LastFM应用用户网络中的结果表明,该方法优于现有方法;在统计学论文引用网络(其中30%节点为孤立节点)中,它提供了可解释的社区结构。