We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}
翻译:我们提出了一种分布式块Chebyshev-Davidson算法,用于求解谱聚类中光谱分析的大规模主导特征值问题。首先,Chebyshev-Davidson算法的效率依赖于对特征值谱的先验知识,而对此类知识的估计可能代价高昂。通过谱聚类中拉普拉斯矩阵或归一化拉普拉斯矩阵的解析谱估计,可以缓解这一问题,从而使所提算法在谱聚类中非常高效。其次,为使该算法能够分析大数据,我们开发了具有良好可扩展性的分布式并行版本。并行计算带来的加速比近似为$\sqrt{p}$,其中$p$表示进程数。数值结果将证明该算法在谱聚类中的效率,以及在并行计算环境中相对于现有用于谱聚类的特征求解器的可扩展性优势。