Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input. BGRL uses only simple augmentations and alleviates the need for contrasting with negative examples, and is thus scalable by design. BGRL outperforms or matches prior methods on several established benchmarks, while achieving a 2-10x reduction in memory costs. Furthermore, we show that BGRL can be scaled up to extremely large graphs with hundreds of millions of nodes in the semi-supervised regime - achieving state-of-the-art performance and improving over supervised baselines where representations are shaped only through label information. In particular, our solution centered on BGRL constituted one of the winning entries to the Open Graph Benchmark - Large Scale Challenge at KDD Cup 2021, on a graph orders of magnitudes larger than all previously available benchmarks, thus demonstrating the scalability and effectiveness of our approach.
翻译:自监督学习为消除图表示学习中对昂贵标签信息的需求提供了有前景的路径。然而,为达到最先进的性能,现有方法通常需要大量负样本并依赖复杂的增广策略,这在处理大规模图时成本过高,甚至难以实现。为应对这些挑战,我们提出自引导图潜在表示(BGRL)——一种通过预测输入图的替代增广进行学习的图表示学习方法。BGRL仅使用简单增广,无需与负样本进行对比学习,因此具有天然的可扩展性。在多个基准测试中,BGRL的性能超越或持平于现有方法,同时将内存成本降低2-10倍。此外,我们证明BGRL可扩展至包含数亿节点的超大规模图,在半监督场景下实现最先进的性能,并超越仅通过标签信息塑造表示的监督基线。特别地,基于BGRL的解决方案在KDD Cup 2021的开放图基准大规模挑战赛中成为获胜方案之一,该挑战赛的图规模比以往所有可用基准测试大数个数量级,充分证明了本方法的可扩展性和有效性。