As a well-known community detection algorithm, Leiden has been widely used in various scenarios such as large language model generation (e.g., Graph-RAG), anomaly detection, and biological analysis. In these scenarios, the graphs are often large and dynamic, where vertices and edges are inserted and deleted frequently, so it is costly to obtain the updated communities by Leiden from scratch when the graph has changed. Recently, one work has attempted to study how to maintain Leiden communities in the dynamic graph, but it lacks a detailed theoretical analysis, and its algorithms are inefficient for large graphs. To address these issues, in this paper, we first theoretically show that the existing algorithms are relatively unbounded via the boundedness analysis (a powerful tool for analyzing incremental algorithms on dynamic graphs), and also analyze the memberships of vertices in communities when the graph changes. Based on theoretical analysis, we develop a novel efficient maintenance algorithm, called Hierarchical Incremental Tree Leiden (HIT-Leiden), which effectively reduces the range of affected vertices by maintaining the connected components and hierarchical community structures. Comprehensive experiments in various datasets demonstrate the superior performance of HIT-Leiden. In particular, it achieves speedups of up to five orders of magnitude over existing methods.
翻译:作为一种著名的社区发现算法,Leiden已被广泛应用于大型语言模型生成(如Graph-RAG)、异常检测和生物分析等多种场景。在这些场景中,图通常规模庞大且动态变化,顶点和边频繁插入与删除,因此当图发生变更时,从头执行Leiden算法来获取更新后的社区需要高昂的计算成本。近期有研究尝试探讨如何在动态图中维护Leiden社区,但缺乏详细的理论分析,且其算法在处理大规模图时效率低下。为解决这些问题,本文首先通过有界性分析(一种用于分析动态图增量算法的强大工具)从理论上证明现有算法相对无界,并分析了图变化时顶点在社区中的归属状态。基于理论分析,我们提出了一种名为分层增量树Leiden(HIT-Leiden)的新型高效维护算法,该算法通过维护连通分量和分层社区结构,有效缩小受影响顶点的范围。在不同数据集上的综合实验表明HIT-Leiden具有优越的性能,尤其相较于现有方法实现了高达五个数量级的加速比。