Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.
翻译:针对网络数据中不同时间点发生的实体及其交互,如何发现实体社区并追踪其演化?本文从图聚类角度处理这一重要任务。近年来,深度聚类方法已在多个领域实现了最先进的聚类性能。特别地,深度图聚类(DGC)方法通过联合优化框架学习节点表示与聚类分配,成功将深度聚类扩展到图结构数据。尽管在模型选择(如编码器架构)上存在差异,现有DGC方法主要基于自编码器,并使用经过相对较小调整的相同聚类目标。此外,尽管许多真实世界图是动态的,以往的DGC方法仅考虑静态图。本文提出CGC——一种全新的端到端图聚类框架,其与现有方法存在本质差异。CGC在对比图学习框架中学习节点嵌入与聚类分配,通过多层次方案精心选择正负样本,使其反映层次化社区结构与网络同质性。我们还将CGC扩展至时变数据,以增量学习方式执行时序图聚类,并具备检测变化点的能力。在真实世界图上的广泛评估表明,CGC方法持续优于现有方法。