Spectral clustering is a popular and effective algorithm designed to find $k$ clusters in a graph $G$. In the classical spectral clustering algorithm, the vertices of $G$ are embedded into $\mathbb{R}^k$ using $k$ eigenvectors of the graph Laplacian matrix. However, computing this embedding is computationally expensive and dominates the running time of the algorithm. In this paper, we present a simple spectral clustering algorithm based on a vertex embedding with $O(\log(k))$ vectors computed by the power method. The vertex embedding is computed in nearly-linear time with respect to the size of the graph, and the algorithm provably recovers the ground truth clusters under natural assumptions on the input graph. We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
翻译:谱聚类是一种流行且有效的算法,旨在在图$G$中找出$k$个聚类。在经典谱聚类算法中,通过图拉普拉斯矩阵的$k$个特征向量将$G$的顶点嵌入到$\mathbb{R}^k$中。然而,计算这种嵌入在计算上非常昂贵,且主导了算法的运行时间。在本文中,我们提出了一种基于幂法计算$O(\log(k))$个向量的顶点嵌入的简单谱聚类算法。该顶点嵌入的计算时间与图的大小呈近线性关系,且该算法在输入图的自然假设下可证明能恢复真实的聚类。我们在多个合成和真实世界数据集上评估了新算法,发现其速度显著快于其他聚类算法,同时产生的聚类精度大致相同。