Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.
翻译:给定网络数据中可能发生在不同时间点的实体及其交互关系,我们应如何发现实体社区并追踪其演化?本文从图聚类视角探讨这一重要任务。近年来,深度聚类方法已在多个领域取得最先进的聚类性能。尤其是深度图聚类方法,通过联合优化框架学习节点表示与聚类分配,成功将深度聚类拓展至图结构数据。尽管在建模选择(如编码器架构)上存在差异,现有深度图聚类方法主要基于自编码器,且使用经过相对微小调整的相同聚类目标函数。此外,尽管许多真实世界图具有动态性,现有深度图聚类方法仅考虑静态图。本文提出CGC——一种与现有方法存在本质区别的新型端到端图聚类框架。CGC在对比图学习框架中学习节点嵌入与聚类分配,通过多层级方案精心选择正负样本,使其能够反映层级化社区结构与网络同质性。同时,我们将CGC扩展至时变数据场景,以增量学习方式执行时间图聚类,并具备检测变化点的能力。对真实世界图的广泛评估表明,所提出的CGC方法持续优于现有方法。