Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
翻译:在大量数据点中,因相互依赖的特性,对大规模图进行表征学习始终是一项长期挑战。Transformer作为新兴的图结构基础编码器,凭借其全局注意力机制能捕获超越邻域节点的全对影响,在小规模图上展现出优异性能。尽管如此,现有方法往往沿袭语言与视觉任务中Transformer的设计理念,通过堆叠深层多头注意力机制构建复杂模型。本文批判性地证明:在节点规模从千级至十亿级的节点属性预测基准中,单层注意力机制即可带来惊人的竞争性能。这促使我们重新审视大规模图上Transformer的设计哲学——全局注意力作为计算开销成为可扩展性的主要障碍。我们将所提方案命名为简化图Transformer(SGFormer),其核心是一种可在单层内高效传递任意节点间信息的简易注意力模型。SGFormer无需位置编码、特征/图预处理或增强损失。实验表明,SGFormer成功扩展至网络级图ogbn-papers100M,并在中等规模图上实现相比现有最优Transformer高达141倍的推理加速。除当前成果外,我们认为本方法论本身为构建大规模图Transformer开辟了独立意义的新技术路径。