Self-supervised node representation learning aims to learn node representations from unlabelled graphs that rival the supervised counterparts. The key towards learning informative node representations lies in how to effectively gain contextual information from the graph structure. In this work, we present simple-yet-effective self-supervised node representation learning via aligning the hidden representations of nodes and their neighbourhood. Our first idea achieves such node-to-neighbourhood alignment by directly maximizing the mutual information between their representations, which, we prove theoretically, plays the role of graph smoothing. Our framework is optimized via a surrogate contrastive loss and a Topology-Aware Positive Sampling (TAPS) strategy is proposed to sample positives by considering the structural dependencies between nodes, which enables offline positive selection. Considering the excessive memory overheads of contrastive learning, we further propose a negative-free solution, where the main contribution is a Graph Signal Decorrelation (GSD) constraint to avoid representation collapse and over-smoothing. The GSD constraint unifies some of the existing constraints and can be used to derive new implementations to combat representation collapse. By applying our methods on top of simple MLP-based node representation encoders, we learn node representations that achieve promising node classification performance on a set of graph-structured datasets from small- to large-scale.
翻译:自监督节点表示学习旨在从无标签图中学习与监督学习相媲美的节点表示。学习信息丰富节点表示的关键在于如何有效获取图结构中的上下文信息。本文提出一种简单而有效的自监督节点表示方法,通过对齐节点及其邻域的隐式表示来实现。第一个核心思想通过直接最大化节点表示与邻域表示之间的互信息实现节点-邻域对齐,我们从理论上证明该方法起到图平滑的作用。该框架通过替代性对比损失进行优化,并提出拓扑感知正采样策略,通过考虑节点间的结构依赖性来采样正样本,从而支持离线正样本选择。针对对比学习带来的过高内存开销,我们进一步提出无需负样本的解决方案,其主要贡献在于引入图信号去相关约束以防止表示坍塌和过平滑。该约束统一了部分现有约束,并可派生新的实现以对抗表示坍塌。将所提方法应用于基于简单MLP的节点表示编码器后,我们在从中小规模到大规模的一组图结构数据集上学习到节点表示,并取得了具有竞争力的节点分类性能。