Community detection is a foundational capability in large-scale industrial graph analytics, powering applications such as fraud-ring discovery, recommendation systems, and hierarchical indexing for retrieval-augmented generation. Among modularity-based methods, the Leiden algorithm has been widely adopted in production because it delivers high-quality communities with connectivity guarantees. However, real-world graphs evolve continuously, and timely community updates are needed to keep downstream features and retrieval indices fresh. Meanwhile, existing dynamic Leiden approaches recompute the communities whenever their vertices and edges change, thereby almost degrading to near-full recomputation under frequent updates. To alleviate the efficiency issue, we study the efficient maintenance of Leiden communities in large dynamic graphs and present a novel algorithm, called Hierarchical Incremental Tree Leiden (HIT-Leiden). We first provide a boundedness analysis showing that prior incremental Leiden methods can incur essentially unbounded work even for small updates. Guided by this analysis, we propose HIT-Leiden which effectively reduces the range of affected vertices by maintaining connected components and hierarchical community structures. Extensive experiments on large real-world dynamic graphs demonstrate that HIT-Leiden achieves community quality comparable to the state-of-the-art competitors while delivering speedups of up to five orders of magnitude over existing solutions. The production deployment results show that HIT-Leiden meets stringent latency requirements under high-rate updates at scale.
翻译:社区检测是大规模工业图分析的基础能力,支撑着欺诈团伙发现、推荐系统以及检索增强生成的分层索引等应用。在基于模块度的方法中,Leiden算法因其能提供具有连通性保证的高质量社区而被广泛应用于生产环境。然而,现实世界中的图持续演化,需要及时更新社区以保持下游特征和检索索引的新鲜度。现有动态Leiden方法在顶点和边发生变化时即重新计算社区,因此在频繁更新下几乎退化为近乎完全重计算。为缓解效率问题,我们研究大规模动态图中Leiden社区的高效维护,并提出一种称为分层增量树Leiden(HIT-Leiden)的新算法。我们首先给出有界性分析,表明即使对于小规模更新,先前的增量Leiden方法也可能产生本质上无界的工作量。在此分析指导下,我们提出HIT-Leiden算法,该算法通过维护连通分量和分层社区结构,有效缩小受影响顶点的范围。在大型真实世界动态图上的大量实验表明,HIT-Leiden实现的社区质量与最先进的竞争方法相当,同时相比现有解决方案获得了高达五个数量级的加速。生产部署结果表明,HIT-Leiden能够在大规模高频率更新下满足严格的延迟要求。