The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
翻译:谱聚类的性能依赖于相似矩阵特征向量分量的波动特性,这一问题迄今尚未得到明确表征。本文证明,一般尖峰随机矩阵模型的信号$+$噪声结构会传递到对应Gram核矩阵的特征向量中,且其分量在大维情形下呈现高斯波动。此类中心极限定理的结果是精确预测谱聚类分类性能的最后一块理论拼图。所提出的证明具有普适性,仅依赖于噪声的旋转不变性。在合成数据与真实数据上的数值实验验证了该现象的普遍性。