GCLS$^2$: Towards Efficient Community Detection Using Graph Contrastive Learning with Structure Semantics

Due to the power of learning representations from unlabeled graphs, graph contrastive learning (GCL) has shown excellent performance in community detection tasks. Existing GCL-based methods on the community detection usually focused on learning attribute representations of individual nodes, which, however, ignores structural semantics of communities (e.g., nodes in the same community should be structurally cohesive). Therefore, in this paper, we will consider the community detection under the community structure semantics and propose an effective framework for graph contrastive learning under structure semantics (GCLS$^2$) to detect communities. To seamlessly integrate interior dense and exterior sparse characteristics of communities with our contrastive learning strategy, we employ classic community structures to extract high-level structural views and design a structure semantic expression module to augment the original structural feature representation. Moreover, we formulate the structure contrastive loss to optimize the feature representation of nodes, which can better capture the topology of communities. To adapt to large-scale networks, we design a high-level graph partitioning (HGP) algorithm that minimizes the community detection loss for GCLS$^2$ online training. It is worth noting that we prove a lower bound on the training of GCLS$^2$ from the perspective of the information theory, explaining why GCLS$^2$ can learn a more accurate representation of the structure. Extensive experiments have been conducted on various real-world graph datasets and confirmed that GCLS$^2$ outperforms nine state-of-the-art methods, in terms of the accuracy, modularity, and efficiency of detecting communities.

翻译：由于能够从无标签图中学习表征，图对比学习（GCL）在社区检测任务中展现出优异性能。现有基于GCL的社区检测方法通常侧重于学习单个节点的属性表征，却忽略了社区的结构语义（例如，同一社区内的节点应具有结构内聚性）。因此，本文将在社区结构语义的框架下研究社区检测问题，并提出一种有效的结构语义图对比学习框架（GCLS$^2$）以检测社区。为将社区内部稠密、外部稀疏的特性与对比学习策略无缝结合，我们采用经典社区结构提取高层结构视图，并设计结构语义表达模块以增强原始结构特征表征。此外，我们构建了结构对比损失函数来优化节点特征表征，从而更好地捕捉社区的拓扑特性。为适应大规模网络，我们设计了一种高层图划分（HGP）算法，以最小化GCLS$^2$在线训练的社区检测损失。值得注意的是，我们从信息论角度证明了GCLS$^2$训练过程的下界，阐释了该框架能够学习更精确结构表征的理论依据。在多个真实图数据集上的大量实验表明，GCLS$^2$在社区检测的准确性、模块度和效率方面均优于九种前沿方法。