Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e. covariates. However, current methods often struggle with selecting tuning parameters and analyzing low-degree nodes. In this paper, we introduce a novel method that addresses these challenges by constructing network-adjusted covariates, which leverage the network connections and covariates with a unique weight to each node based on the node's degree. Spectral clustering on network-adjusted covariates yields an exact recovery of community labels under certain conditions, which is tuning-free and computationally efficient. We present novel theoretical results about the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of mis-specification and sparse communities with bounded degrees. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network, and provides interpretable community structures in a statistics publication citation network where $30\%$ of nodes are isolated.
翻译:社区检测是网络分析中的一项关键任务,通过引入主体级信息(即协变量)可显著提升其性能。然而,现有方法常面临调参困难以及低度节点分析不足的问题。本文提出一种新方法,通过构建网络调整协变量来应对这些挑战:该方法利用网络连接与协变量,根据节点的度数赋予每个节点独特权重。对网络调整协变量进行谱聚类可在特定条件下实现社区标签的精确恢复,且该方法免于调参、计算高效。我们提出了关于方法在协变量存在下的度修正随机块模型中的强一致性的新颖理论结果,即使存在模型误设与稀疏社区(含有限度数)时也成立。此外,我们建立了同时存在网络与协变量时社区检测问题的通用下界,并证明本方法在常数因子范围内达到最优。在仿真实验及LastFM应用用户网络中,该方法优于现有方法;并在统计学期刊引用网络(其中30%的节点为孤立节点)中提供了可解释的社区结构。