Graph Transformers (GTs) show considerable potential in graph representation learning. The architecture of GTs typically integrates Graph Neural Networks (GNNs) with global attention mechanisms either in parallel or as a precursor to attention mechanisms, yielding a local-and-global or local-to-global attention scheme. However, as the global attention mechanism primarily captures long-range dependencies between nodes, these integration schemes may suffer from information loss, where the local neighborhood information learned by GNN could be diluted by the attention mechanism. Therefore, we propose G2LFormer, featuring a novel global-to-local attention scheme where the shallow network layers use attention mechanisms to capture global information, while the deeper layers employ GNN modules to learn local structural information, thereby preventing nodes from ignoring their immediate neighbors. An effective cross-layer information fusion strategy is introduced to allow local layers to retain beneficial information from global layers and alleviate information loss, with acceptable trade-offs in scalability. To validate the feasibility of the global-to-local attention scheme, we compare G2LFormer with state-of-the-art linear GTs and GNNs on node-level and graph-level tasks. The results indicate that G2LFormer exhibits excellent performance while keeping linear complexity.
翻译:图Transformer(GTs)在图表示学习领域展现出巨大潜力。其架构通常将图神经网络(GNNs)与全局注意力机制以并行方式或作为注意力机制的前置模块进行整合,形成局部-全局或局部到全局的注意力机制。然而,由于全局注意力机制主要捕获节点间的长程依赖关系,这些整合方案可能面临信息损失问题——即GNN所学习的局部邻域信息可能被注意力机制稀释。为此,我们提出G2LFormer模型,其采用新颖的全局到局部注意力机制:浅层网络使用注意力机制捕获全局信息,深层网络则采用GNN模块学习局部结构信息,从而防止节点忽略其直接邻域。我们引入了一种有效的跨层信息融合策略,使局部层能够保留来自全局层的有效信息并缓解信息损失,同时在可扩展性方面保持可接受的权衡。为验证全局到局部注意力机制的可行性,我们在节点级和图级任务上将G2LFormer与最先进的线性GTs及GNNs进行对比。实验结果表明,G2LFormer在保持线性复杂度的同时展现出卓越的性能。