Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
翻译:图Transformer在机器学习领域日益受到关注,并在图表示学习基准测试中展现出最先进的性能。然而,当前的图Transformer实现主要聚焦于小规模图的学习表示,全局自注意力机制的二次复杂度使得将其应用于更大规模图的全批训练面临挑战。此外,传统的基于采样的方法无法捕获必要的高层上下文信息,导致显著的性能损失。本文提出层次化可扩展图Transformer(Hierarchical Scalable Graph Transformer, HSGT)以应对这些挑战。HSGT成功地将Transformer架构扩展到大规模图上的节点表示学习任务,同时保持高性能。通过利用粗化技术构建的图层次结构,HSGT在不同层级高效更新并存储节点嵌入中的多尺度信息。结合基于采样的训练方法,HSGT仅使用Transformer模块即可有效捕获并聚合层次图上的多层次信息。实验评估表明,HSGT在处理包含数百万节点的大规模图基准测试中,以高计算效率实现了最先进的性能。