This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM), addressing limitations of existing models. In contrast to the Stochastic Block Model (SBM), which assumes uniform vertex degrees, and to the Degree-Corrected Block Model (DCBM), which applies uniform degree corrections across clusters, PABM introduces separate popularity parameters for intra- and inter-cluster connections. Our main contribution is the characterization of the optimal error rate for clustering under PABM, which provides novel insights on clustering hardness: we demonstrate that unlike SBM and DCBM, cluster recovery remains possible in PABM even when traditional edge-density signals vanish, provided intra- and inter-cluster popularity coefficients differ. This highlights a dimension of degree heterogeneity captured by PABM but overlooked by DCBM: local differences in connectivity patterns can enhance cluster separability independently of global edge densities. Finally, because PABM exhibits a richer structure, its expected adjacency matrix has rank between $k$ and $k^2$, where $k$ is the number of clusters. As a result, spectral embeddings based on the top $k$ eigenvectors may fail to capture important structural information. Our numerical experiments on both synthetic and real datasets confirm that spectral clustering algorithms incorporating $k^2$ eigenvectors outperform traditional spectral approaches.
翻译:本文在流行度调整块模型(PABM)下建立了图聚类的理论极限,解决了现有模型的局限性。与假设顶点度均匀的随机块模型(SBM)以及在各聚类内应用统一度校正的度校正块模型(DCBM)不同,PABM为聚类内连接和聚类间连接引入了独立的流行度参数。我们的主要贡献在于刻画了PABM下聚类的最优错误率,这为聚类难度提供了新的见解:我们证明,与SBM和DCBM不同,只要聚类内与聚类间流行度系数存在差异,即使传统的边缘密度信号消失,PABM中的聚类恢复仍然是可能的。这凸显了PABM捕捉到但被DCBM忽视的度异质性维度:连接模式的局部差异可以独立于全局边缘密度增强聚类的可分离性。最后,由于PABM展现出更丰富的结构,其期望邻接矩阵的秩介于$k$和$k^2$之间,其中$k$是聚类数量。因此,基于前$k$个特征向量的谱嵌入可能无法捕捉重要的结构信息。我们在合成和真实数据集上的数值实验证实,融合$k^2$个特征向量的谱聚类算法优于传统的谱方法。