The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the allometric extension model, that is, the directions of the first eigenvectors of two covariance matrices and the direction of the difference of two mean vectors coincide, and we provide a non-asymptotic bound of the error probability of the spectral clustering algorithm for the allometric extension model. As a byproduct of the result, we obtain the consistency of the clustering method in high-dimensional settings.
翻译:谱聚类算法通常通过应用主成分分析作为未分类数据的二值聚类方法。为了研究该算法的理论性质,现有研究中常假设条件同方差性。然而,这一假设在实际应用中具有限制性且往往不现实。因此,本文考虑异速生长扩展模型,即两个协方差矩阵的第一特征向量方向与两个均值向量之差的方向一致,并给出了该模型下谱聚类算法错误概率的非渐近界。作为该结果的副产品,我们获得了高维场景下聚类方法的一致性。