Graph clustering is a longstanding topic in machine learning. Recently, deep methods have achieved results but still require predefined cluster numbers K and struggle with imbalanced graphs. We study deep graph clustering without K considering realistic imbalance through structural information theory. In the literature, structural information is rarely used in deep clustering, and its classic discrete definition neglects node attributes while exhibiting prohibitive complexity. In this paper, we establish a differentiable structural information framework, generalizing the discrete formalism to the continuous realm. We design a hyperbolic model (LSEnet) to learn the neural partitioning tree in the Lorentz model. Theoretically, we demonstrate its capability in clustering without K and identifying minority clusters. Second, we refine hyperbolic representations to enhance graph semantics. Since tree contrastive learning is non-trivial and costs quadratic complexity, we advance our theory by discovering that structural entropy bounds the tree contrastive loss. Finally, we approach graph clustering through a novel augmented structural information learning (ASIL), which offers an efficient objective to integrate hyperbolic partitioning tree construction and contrastive learning. With a provable improvement in graph conductance, ASIL achieves effective debiased graph clustering in linear complexity. Extensive experiments show ASIL outperforms 20 strong baselines by an average of +12.42% in NMI on the Citeseer dataset.
翻译:图聚类是机器学习中的一个长期研究课题。近年来,深度方法虽取得成果,但仍需预定义聚类数目K,且难以处理不平衡图。本研究基于结构信息理论,探索无需K且考虑现实不平衡性的深度图聚类方法。现有文献中,结构信息极少用于深度聚类,其经典离散定义忽略了节点属性且计算复杂度极高。本文建立了一个可微的结构信息框架,将离散形式推广至连续域。我们设计了一个双曲模型(LSEnet)来学习洛伦兹模型中的神经划分树。理论上,我们证明了该模型具备无需K进行聚类及识别少数类簇的能力。其次,我们通过优化双曲表示以增强图语义。由于树对比学习具有非平凡性且需二次复杂度,我们通过发现结构熵可约束树对比损失的理论进展,推进了研究。最后,我们通过一种新颖的增强结构信息学习(ASIL)实现图聚类,该方法提供了高效的目标函数,以整合双曲划分树构建与对比学习。ASIL在可证明改善图导纳的前提下,以线性复杂度实现了有效的去偏图聚类。大量实验表明,在Citeseer数据集上,ASIL在NMI指标上平均优于20个强基线方法+12.42%。