In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to produce informative representations from unlabeled graph data, reducing the reliance on expensive labeled data. While SSL on graphs has witnessed widespread adoption, one critical component, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature. Thus, this survey aims to fill this gap by offering a dedicated survey on GCL. We provide a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives. Furthermore, we explore the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios. We also discuss practical applications spanning domains such as drug discovery, genomics analysis, recommender systems, and finally outline the challenges and potential future directions in this field.
翻译:近年来,基于图的深度学习在多个领域取得了显著成功。然而,由于标注图数据成本高昂且耗时费力,对标注数据的依赖成为关键瓶颈。为解决该问题,图上的自监督学习(SSL)受到日益关注并取得重要进展。SSL使机器学习模型能够从未标注图数据中生成信息丰富的表征,从而减少对昂贵标注数据的依赖。尽管图上的SSL已被广泛采用,但其中核心组件——图对比学习(GCL)尚未在现有文献中得到充分研究。因此,本综述旨在通过系统梳理GCL来填补这一空白。我们全面概述了GCL的基本原理,包括数据增广策略、对比模式及对比优化目标。此外,我们探索了GCL在数据高效图学习其他方向(如弱监督学习、迁移学习及相关场景)中的扩展应用。本文还讨论了涵盖药物发现、基因组分析、推荐系统等领域的实际应用,并最终总结了该领域面临的挑战与潜在未来方向。