Vertex-level clustering for directed graphs (digraphs) remains challenging as edge directionality breaks the key assumptions underlying popular spectral methods, which also incur the overhead of eigen-decomposition. This paper proposes Parametrized Power-Iteration Clustering (ParPIC), a random-walk-based clustering method for weakly connected digraphs. This builds over the Power-Iteration Clustering paradigm, which uses the rows of the iterated diffusion operator as a data embedding. ParPIC has three important features: the use of parametrized reversible random walk operators, the automatic tuning of the diffusion time, and the efficient truncation of the final embedding, which produces low-dimensional data representations and reduces complexity. Empirical results on synthetic and real-world graphs demonstrate that ParPIC achieves competitive clustering accuracy with improved scalability relative to spectral and teleportation-based methods.
翻译:针对有向图(digraphs)的顶点层级聚类仍然具有挑战性,因为边的方向性破坏了主流谱方法所依赖的关键假设,且这些方法还需承担特征分解的计算开销。本文提出参数化幂迭代聚类(ParPIC),一种基于随机游走的弱连通有向图聚类方法。该方法建立在幂迭代聚类范式之上,该范式使用迭代扩散算子的行向量作为数据嵌入表示。ParPIC具有三个重要特性:采用参数化可逆随机游走算子、自动优化扩散时间、以及对最终嵌入表示进行高效截断,从而生成低维数据表示并降低计算复杂度。在合成图与真实世界图上的实验结果表明,相较于谱方法与基于传送机制的方法,ParPIC在保持竞争力的聚类精度的同时显著提升了可扩展性。