Self-supervised node representation learning aims to learn node representations from unlabelled graphs that rival the supervised counterparts. The key towards learning informative node representations lies in how to effectively gain contextual information from the graph structure. In this work, we present simple-yet-effective self-supervised node representation learning via aligning the hidden representations of nodes and their neighbourhood. Our first idea achieves such node-to-neighbourhood alignment by directly maximizing the mutual information between their representations, which, we prove theoretically, plays the role of graph smoothing. Our framework is optimized via a surrogate contrastive loss and a Topology-Aware Positive Sampling (TAPS) strategy is proposed to sample positives by considering the structural dependencies between nodes, which enables offline positive selection. Considering the excessive memory overheads of contrastive learning, we further propose a negative-free solution, where the main contribution is a Graph Signal Decorrelation (GSD) constraint to avoid representation collapse and over-smoothing. The GSD constraint unifies some of the existing constraints and can be used to derive new implementations to combat representation collapse. By applying our methods on top of simple MLP-based node representation encoders, we learn node representations that achieve promising node classification performance on a set of graph-structured datasets from small- to large-scale.
翻译:自监督节点表示学习旨在从无标签图中学习能够媲美监督方法的节点表示。学习信息性节点表示的关键在于如何有效获取图结构中的上下文信息。本文提出了一种简单而有效的自监督节点表示学习方法,通过对齐节点及其邻居的隐层表示来实现。我们的第一个想法是通过直接最大化节点表示与其邻居表示之间的互信息来实现这种节点-邻居对齐,我们理论上证明该过程起到了图平滑的作用。该框架通过代理对比损失进行优化,并提出了拓扑感知正采样策略,通过考虑节点间的结构依赖关系来选择正样本,从而支持离线的正样本选择。考虑到对比学习过高的内存开销,我们进一步提出了一种无需负样本的解决方案,其主要贡献在于引入图信号去相关约束来避免表示坍塌和过度平滑。该约束统一了现有的部分约束方法,并可推导出新的实现以对抗表示坍塌。通过将我们的方法应用于简单的基于多层感知机的节点表示编码器上,我们在从小规模到大规模的一组图结构数据集上学习到了具有良好节点分类性能的节点表示。