This paper studies the problem of graph-level clustering, which is a novel yet challenging task. This problem is critical in a variety of real-world applications such as protein clustering and genome analysis in bioinformatics. Recent years have witnessed the success of deep clustering coupled with graph neural networks (GNNs). However, existing methods focus on clustering among nodes given a single graph, while exploring clustering on multiple graphs is still under-explored. In this paper, we propose a general graph-level clustering framework named Graph-Level Contrastive Clustering (GLCC) given multiple graphs. Specifically, GLCC first constructs an adaptive affinity graph to explore instance- and cluster-level contrastive learning (CL). Instance-level CL leverages graph Laplacian based contrastive loss to learn clustering-friendly representations while cluster-level CL captures discriminative cluster representations incorporating neighbor information of each sample. Moreover, we utilize neighbor-aware pseudo-labels to reward the optimization of representation learning. The two steps can be alternatively trained to collaborate and benefit each other. Experiments on a range of well-known datasets demonstrate the superiority of our proposed GLCC over competitive baselines.
翻译:本文研究了图级聚类问题,这是一个新颖且具有挑战性的任务。该问题在多种实际应用中至关重要,例如生物信息学中的蛋白质聚类和基因组分析。近年来,深度聚类与图神经网络(GNNs)的结合取得了成功。然而,现有方法主要关注给定单个图下的节点聚类,而对多个图的聚类探索仍不充分。本文提出了一种通用图级聚类框架,名为图级对比聚类(GLCC),适用于多个图场景。具体而言,GLCC首先构建自适应亲和图,以探索实例级和聚类级对比学习(CL)。实例级对比学习利用基于图拉普拉斯的对比损失来学习聚类友好的表示,而聚类级对比学习则通过结合每个样本的邻域信息捕获具有判别性的聚类表示。此外,我们采用邻域感知伪标签来奖励表示学习的优化过程。这两个步骤可以交替训练,相互协作并促进彼此。在一系列知名数据集上的实验表明,我们提出的GLCC方法优于多种竞争基线。