Deep Cut-informed Graph Embedding and Clustering

Graph clustering aims to divide the graph into different clusters. The recently emerging deep graph clustering approaches are largely built on graph neural networks (GNN). However, GNN is designed for general graph encoding and there is a common issue of representation collapse in existing GNN-based deep graph clustering algorithms. We attribute two main reasons for such issues: (i) the inductive bias of GNN models: GNNs tend to generate similar representations for proximal nodes. Since graphs often contain a non-negligible amount of inter-cluster links, the bias results in error message passing and leads to biased clustering; (ii) the clustering guided loss function: most traditional approaches strive to make all samples closer to pre-learned cluster centers, which causes a degenerate solution assigning all data points to a single label thus make all samples and less discriminative. To address these challenges, we investigate graph clustering from a graph cut perspective and propose an innovative and non-GNN-based Deep Cut-informed Graph embedding and Clustering framework, namely DCGC. This framework includes two modules: (i) cut-informed graph encoding; (ii) self-supervised graph clustering via optimal transport. For the encoding module, we derive a cut-informed graph embedding objective to fuse graph structure and attributes by minimizing their joint normalized cut. For the clustering module, we utilize the optimal transport theory to obtain the clustering assignments, which can balance the guidance of "proximity to the pre-learned cluster center". With the above two tailored designs, DCGC is more suitable for the graph clustering task, which can effectively alleviate the problem of representation collapse and achieve better performance. We conduct extensive experiments to demonstrate that our method is simple but effective compared with benchmarks.

翻译：图聚类的目标是将图划分为不同的簇。近期兴起的深度图聚类方法主要建立在图神经网络（GNN）之上。然而，GNN是为通用图编码设计的，现有基于GNN的深度图聚类算法普遍存在表示坍塌的问题。我们将此问题归因于两个主要原因：（i）GNN模型的归纳偏置：GNN倾向于为邻近节点生成相似的表示。由于图中通常包含不可忽视的簇间连接，这种偏置会导致错误信息传递，从而产生有偏的聚类结果；（ii）聚类引导的损失函数：大多数传统方法致力于使所有样本更接近预学习的聚类中心，这会导致将所有数据点分配至单一标签的退化解，从而使样本表示缺乏区分性。为应对这些挑战，我们从图割的视角研究图聚类，并提出了一种创新的、非基于GNN的深度割信息图嵌入与聚类框架，命名为DCGC。该框架包含两个模块：（i）割信息图编码；（ii）基于最优传输的自监督图聚类。在编码模块中，我们推导了一个割信息图嵌入目标，通过最小化图结构与属性的联合归一化割来融合二者。在聚类模块中，我们利用最优传输理论获取聚类分配，从而平衡“接近预学习聚类中心”这一指导原则。通过以上两项定制化设计，DCGC更适用于图聚类任务，能够有效缓解表示坍塌问题并取得更优性能。我们进行了大量实验，结果表明与基准方法相比，我们的方法简洁而高效。