Homophily-enhanced Structure Learning for Graph Clustering

Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.

翻译：图聚类是图分析中的基础任务，近年来利用图神经网络（GNNs）的相关研究取得了显著成果。尽管现有基于GNN的图聚类方法表现优异，但它们常忽视图结构的质量——由于真实世界图数据固有的稀疏性和多样性，这种质量缺陷会导致聚类性能欠佳。图结构学习可通过添加缺失连接和去除虚假连接来优化输入图，然而以往研究多聚焦于监督学习场景，因缺乏真实标签而无法直接应用于聚类任务。为弥合这一差距，我们提出名为HoLe（同质性增强的图聚类结构学习）的新方法。其动机源于观察到：适度提升图结构中的同质性程度可显著改善GNN及聚类效果。为此，我们设计了两个面向聚类的结构学习模块：层次相关性估计与聚类感知稀疏化。前者通过引入潜在空间与聚类空间的引导信息实现更精准的节点对关系估计，后者则基于相似度矩阵与聚类分配生成稀疏化结构。此外，我们提出交替训练同质性增强结构学习与GNN聚类的联合优化策略，以强化二者的协同效应。在七个涵盖不同类型与规模的标准数据集上，综合多项聚类指标进行的实验表明，HoLe方法全面优于现有最先进基线模型。