GTC: GNN-Transformer Co-contrastive Learning for Self-supervised Heterogeneous Graph Representation

Graph Neural Networks (GNNs) have emerged as the most powerful weapon for various graph tasks due to the message-passing mechanism's great local information aggregation ability. However, over-smoothing has always hindered GNNs from going deeper and capturing multi-hop neighbors. Unlike GNNs, Transformers can model global information and multi-hop interactions via multi-head self-attention and a proper Transformer structure can show more immunity to the over-smoothing problem. So, can we propose a novel framework to combine GNN and Transformer, integrating both GNN's local information aggregation and Transformer's global information modeling ability to eliminate the over-smoothing problem? To realize this, this paper proposes a collaborative learning scheme for GNN-Transformer and constructs GTC architecture. GTC leverages the GNN and Transformer branch to encode node information from different views respectively, and establishes contrastive learning tasks based on the encoded cross-view information to realize self-supervised heterogeneous graph representation. For the Transformer branch, we propose Metapath-aware Hop2Token and CG-Hetphormer, which can cooperate with GNN to attentively encode neighborhood information from different levels. As far as we know, this is the first attempt in the field of graph representation learning to utilize both GNN and Transformer to collaboratively capture different view information and conduct cross-view contrastive learning. The experiments on real datasets show that GTC exhibits superior performance compared with state-of-the-art methods. Codes can be available at https://github.com/PHD-lanyu/GTC.

翻译：图神经网络（GNN）凭借消息传递机制强大的局部信息聚合能力，已成为处理各类图任务最有力的工具。然而，过平滑问题始终阻碍着GNN向更深层发展，使其难以捕获多跳邻居信息。与GNN不同，Transformer可通过多头自注意力机制建模全局信息与多跳交互，且合适的Transformer结构对过平滑问题具有更强的鲁棒性。那么，我们能否提出一种新型框架来融合GNN与Transformer，整合GNN的局部信息聚合能力与Transformer的全局信息建模能力，从而消除过平滑问题？为实现这一目标，本文提出了一种GNN-Transformer协同学习方案，并构建了GTC架构。GTC分别利用GNN分支和Transformer分支从不同视角编码节点信息，并基于编码后的跨视角信息建立对比学习任务，以实现自监督异构图表示。针对Transformer分支，我们提出了元路径感知的Hop2Token机制和CG-Hetphormer结构，使其能与GNN协同，从不同层级对邻域信息进行注意力编码。据我们所知，这是图表示学习领域首次尝试同时利用GNN和Transformer协同捕获不同视角信息并进行跨视角对比学习。在真实数据集上的实验表明，GTC相比现有最先进方法展现出更优越的性能。代码可在https://github.com/PHD-lanyu/GTC获取。