The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
翻译:谱聚类的性能依赖于相似矩阵特征向量元素的波动,这一现象此前尚未得到刻画。本文证明:一般尖峰随机矩阵模型的“信号+噪声”结构可传递至相应格拉姆核矩阵的特征向量,且在高维极限下其元素的波动服从高斯分布。这一类中心极限定理结果是精确预测谱聚类分类性能的最后缺失环节。所提出的证明具有高度普适性,仅依赖于噪声的旋转不变性。基于合成数据与真实数据的数值实验表明该现象的普适性。