SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation

Graph contrastive learning (GCL) has emerged as a dominant technique for graph representation learning which maximizes the mutual information between paired graph augmentations that share the same semantics. Unfortunately, it is difficult to preserve semantics well during augmentations in view of the diverse nature of graph data. Currently, data augmentations in GCL that are designed to preserve semantics broadly fall into three unsatisfactory ways. First, the augmentations can be manually picked per dataset by trial-and-errors. Second, the augmentations can be selected via cumbersome search. Third, the augmentations can be obtained by introducing expensive domain-specific knowledge as guidance. All of these limit the efficiency and more general applicability of existing GCL methods. To circumvent these crucial issues, we propose a \underline{Sim}ple framework for \underline{GRA}ph \underline{C}ontrastive l\underline{E}arning, \textbf{SimGRACE} for brevity, which does not require data augmentations. Specifically, we take original graph as input and GNN model with its perturbed version as two encoders to obtain two correlated views for contrast. SimGRACE is inspired by the observation that graph data can preserve their semantics well during encoder perturbations while not requiring manual trial-and-errors, cumbersome search or expensive domain knowledge for augmentations selection. Also, we explain why SimGRACE can succeed. Furthermore, we devise adversarial training scheme, dubbed \textbf{AT-SimGRACE}, to enhance the robustness of graph contrastive learning and theoretically explain the reasons. Albeit simple, we show that SimGRACE can yield competitive or better performance compared with state-of-the-art methods in terms of generalizability, transferability and robustness, while enjoying unprecedented degree of flexibility and efficiency.

翻译：图对比学习（Graph Contrastive Learning, GCL）已成为图表示学习的主流技术，其核心在于最大化共享相同语义的配对图增强之间的互信息。然而，由于图数据具有多样化的特性，在增强过程中难以妥善保持语义信息。当前，旨在保持语义的GCL数据增强方法大致存在三种不足：其一，增强方法需针对不同数据集通过试错手动选择；其二，增强方法需通过繁琐搜索进行选取；其三，增强方法需引入昂贵的领域特定知识作为指导。这些限制均降低了现有GCL方法的效率及普适性。为规避这些关键问题，本文提出一个无需数据增强的简易图对比学习框架（简称\textbf{SimGRACE}）。具体而言，我们以原始图作为输入，将图神经网络（GNN）模型及其扰动版本分别作为两个编码器，以获取用于对比的两组关联视图。SimGRACE的设计灵感源于以下观察：图数据在编码器扰动过程中能良好保持语义，且无需人工试错、繁琐搜索或昂贵的领域知识来选取增强方法。此外，我们从理论层面阐释了SimGRACE的成功机理。为增强图对比学习的鲁棒性，我们进一步设计了对抗训练方案（简称\textbf{AT-SimGRACE}），并给出了理论解释。尽管方法简洁，但实验表明，SimGRACE在泛化性、迁移性和鲁棒性方面可达到与现有最优方法相当甚至更优的性能，同时享有前所未有的灵活性与效率。