Contrastive graph node clustering via learnable data augmentation is a hot research spot in the field of unsupervised graph learning. The existing methods learn the sampling distribution of a pre-defined augmentation to generate data-driven augmentations automatically. Although promising clustering performance has been achieved, we observe that these strategies still rely on pre-defined augmentations, the semantics of the augmented graph can easily drift. The reliability of the augmented view semantics for contrastive learning can not be guaranteed, thus limiting the model performance. To address these problems, we propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT). Specifically, in our method, the data augmentations are processed by the proposed reversible perturb-recover network. It distills reliable semantic information by recovering the perturbed latent embeddings. Moreover, to further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network via quantifying the perturbation and recovery. Lastly, a label-matching mechanism is designed to guide the model by clustering information through aligning the semantic labels and the selected high-confidence clustering pseudo labels. Extensive experimental results on seven datasets demonstrate the effectiveness of the proposed method. We release the code and appendix of CONVERT at https://github.com/xihongyang1999/CONVERT on GitHub.
翻译:通过可学习数据增强的对比图节点聚类是无监督图学习领域的研究热点。现有方法通过学习预定义增强的采样分布来自动生成数据驱动的增强样本。尽管取得了显著的聚类性能,但我们观察到这些策略仍依赖于预定义增强,增强后的图语义易发生漂移。增强视图语义在对比学习中的可靠性无法得到保证,从而限制了模型性能。为解决这些问题,我们提出了一种新颖的基于可靠增强的对比图聚类网络(CONVERT)。具体而言,在该方法中,数据增强通过所提出的可逆扰动-恢复网络进行处理。该网络通过恢复被扰动的潜在嵌入来提取可靠的语义信息。此外,为进一步保证语义的可靠性,我们提出了一种新的语义损失,通过量化扰动与恢复过程来约束网络。最后,设计了一种标签匹配机制,通过对齐语义标签与选定的高置信度聚类伪标签,利用聚类信息引导模型。在七个数据集上的大量实验结果表明了该方法的有效性。我们在GitHub上(https://github.com/xihongyang1999/CONVERT)公开了CONVERT的代码与附录。