Bootstrap Latents of Nodes and Neighbors for Graph Self-Supervised Learning

Contrastive learning is a significant paradigm in graph self-supervised learning. However, it requires negative samples to prevent model collapse and learn discriminative representations. These negative samples inevitably lead to heavy computation, memory overhead and class collision, compromising the representation learning. Recent studies present that methods obviating negative samples can attain competitive performance and scalability enhancements, exemplified by bootstrapped graph latents (BGRL). However, BGRL neglects the inherent graph homophily, which provides valuable insights into underlying positive pairs. Our motivation arises from the observation that subtly introducing a few ground-truth positive pairs significantly improves BGRL. Although we can't obtain ground-truth positive pairs without labels under the self-supervised setting, edges in the graph can reflect noisy positive pairs, i.e., neighboring nodes often share the same label. Therefore, we propose to expand the positive pair set with node-neighbor pairs. Subsequently, we introduce a cross-attention module to predict the supportiveness score of a neighbor with respect to the anchor node. This score quantifies the positive support from each neighboring node, and is encoded into the training objective. Consequently, our method mitigates class collision from negative and noisy positive samples, concurrently enhancing intra-class compactness. Extensive experiments are conducted on five benchmark datasets and three downstream task node classification, node clustering, and node similarity search. The results demonstrate that our method generates node representations with enhanced intra-class compactness and achieves state-of-the-art performance.

翻译：对比学习是图自监督学习中的重要范式。然而，该方法需要负样本来防止模型坍塌并学习判别性表征。这些负样本不可避免地导致沉重的计算负担、内存开销以及类别冲突，从而损害表征学习。近期研究表明，无需负样本的方法能够获得具有竞争力的性能与可扩展性提升，以自举图潜变量（BGRL）为例。然而，BGRL忽略了图固有的同质性，而该特性为潜在的正面样本对提供了有价值的线索。我们的动机源于以下观察：巧妙引入少量真实正面样本对能显著提升BGRL的性能。尽管在自监督设置下无法通过无标签数据获得真实正面样本对，但图中的边能够反映带噪声的正面样本对，即相邻节点通常共享相同标签。因此，我们提出通过节点-邻居对来扩展正面样本对集合。随后，我们引入一个交叉注意力模块来预测邻居节点相对于锚节点的支持度分数。该分数量化了每个邻居节点提供的正面支持，并被编码到训练目标中。因此，我们的方法缓解了来自负样本及带噪声正面样本的类别冲突，同时增强了类内紧致性。我们在五个基准数据集和三个下游任务（节点分类、节点聚类和节点相似性搜索）上进行了广泛实验。结果表明，我们的方法能够生成具有更强类内紧致性的节点表征，并取得了最先进的性能。