Spectral clustering is a fundamental method for graph partitioning, but its reliance on eigenvector computation limits scalability to massive graphs. Classical sparsification methods preserve spectral properties by sampling edges proportionally to their effective resistances, but require expensive preprocessing to estimate these resistances. We study whether uniform edge sampling-a simple, structure-agnostic strategy-can suffice for spectral clustering. Our main result shows that for graphs admitting a well-separated $k$-clustering, characterized by a large structure ratio $\Upsilon(k) = \lambda_{k+1} / \rho_G(k)$, uniform sampling preserves the spectral subspace used for clustering. Specifically, we prove that uniformly sampling $O(\gamma^2 n \log n / \epsilon^2)$ edges, where $\gamma$ is the Laplacian condition number, yields a sparsifier whose top $(n-k)$-dimensional eigenspace is approximately orthogonal to the cluster indicators. This ensures that the spectral embedding remains faithful, and clustering quality is preserved. Our analysis introduces new resistance bounds for intra-cluster edges, a rank-$(n-k)$ effective resistance formulation, and a matrix Chernoff bound adapted to the dominant eigenspace. These tools allow us to bypass importance sampling entirely. Conceptually, our result connects recent coreset-based clustering theory to spectral sparsification, showing that under strong clusterability, even uniform sampling is structure-aware. This provides the first provable guarantee that uniform edge sampling suffices for structure-preserving spectral clustering.
翻译:谱聚类是图划分的基础方法,但其对特征向量计算的依赖限制了在大规模图上的可扩展性。经典的稀疏化方法通过按边有效电阻的比例采样来保持谱性质,但需要昂贵的预处理来估计这些电阻。我们研究均匀边采样——一种简单且与结构无关的策略——是否足以支持谱聚类。我们的主要结果表明,对于具有良好分离性的$k$聚类图(其特征为较大的结构比$\Upsilon(k) = \lambda_{k+1} / \rho_G(k)$),均匀采样能够保留用于聚类的谱子空间。具体而言,我们证明均匀采样$O(\gamma^2 n \log n / \epsilon^2)$条边(其中$\gamma$为拉普拉斯条件数)可得到一个稀疏化图,其顶部$(n-k)$维特征空间近似正交于聚类指示向量。这确保了谱嵌入的保真性,并维持了聚类质量。我们的分析引入了针对簇内边的新电阻界、秩为$(n-k)$的有效电阻公式,以及适用于主导特征空间的矩阵切尔诺夫界。这些工具使我们能够完全绕过重要性采样。从概念上讲,我们的结果将近期基于核心集的聚类理论与谱稀疏化联系起来,表明在强可聚类性条件下,即使是均匀采样也具有结构感知能力。这首次提供了可证明的保证:均匀边采样足以实现结构保持的谱聚类。