Graph Signal Filter used as dimensionality reduction in spectral clustering usually requires expensive eigenvalue estimation. We analyze the filter in an optimization setting and propose to use four orthogonalization-free methods by optimizing objective functions as dimensionality reduction in spectral clustering. The proposed methods do not utilize any orthogonalization, which is known as not well scalable in a parallel computing environment. Our methods theoretically construct adequate feature space, which is, at most, a weighted alteration to the eigenspace of a normalized Laplacian matrix. We numerically hypothesize that the proposed methods are equivalent in clustering quality to the ideal Graph Signal Filter, which exploits the exact eigenvalue needed without expensive eigenvalue estimation. Numerical results show that the proposed methods outperform Power Iteration-based methods and Graph Signal Filter in clustering quality and computation cost. Unlike Power Iteration-based methods and Graph Signal Filter which require random signal input, our methods are able to utilize available initialization in the streaming graph scenarios. Additionally, numerical results show that our methods outperform ARPACK and are faster than LOBPCG in the streaming graph scenarios. We also present numerical results showing the scalability of our methods in multithreading and multiprocessing implementations to facilitate parallel spectral clustering.
翻译:图信号滤波器在谱聚类中用作降维时,通常需要昂贵的特征值估计。我们在优化框架下分析该滤波器,并提出四种免正交化方法,通过优化目标函数实现谱聚类中的降维。所提方法不使用任何正交化(正交化在并行计算环境中可扩展性较差)。我们的方法在理论上构建了合适的特征空间,该空间至多是归一化拉普拉斯矩阵特征空间的加权变体。我们通过数值实验假设,所提方法在聚类质量上等同于理想的图信号滤波器——该滤波器无需昂贵特征值估计即可直接利用精确特征值。数值结果表明,所提方法在聚类质量和计算成本上均优于基于幂迭代的方法和图信号滤波器。与需随机信号输入的幂迭代方法和图信号滤波器不同,我们的方法能够利用流式图场景中已有的初始化信息。此外,数值结果显示,在流式图场景中,我们的方法优于ARPACK且比LOBPCG更快。我们还通过多线程和多进程实现展示了所提方法的可扩展性,以促进并行谱聚类。