Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical Study

Graph Transformers (GTs) show considerable potential in graph representation learning. The architecture of GTs typically integrates Graph Neural Networks (GNNs) with global attention mechanisms either in parallel or as a precursor to attention mechanisms, yielding a local-and-global or local-to-global attention scheme. However, as the global attention mechanism primarily captures long-range dependencies between nodes, these integration schemes may suffer from information loss, where the local neighborhood information learned by GNN could be diluted by the attention mechanism. Therefore, we propose G2LFormer, featuring a novel global-to-local attention scheme where the shallow network layers use attention mechanisms to capture global information, while the deeper layers employ GNN modules to learn local structural information, thereby preventing nodes from ignoring their immediate neighbors. An effective cross-layer information fusion strategy is introduced to allow local layers to retain beneficial information from global layers and alleviate information loss, with acceptable trade-offs in scalability. To validate the feasibility of the global-to-local attention scheme, we compare G2LFormer with state-of-the-art linear GTs and GNNs on node-level and graph-level tasks. The results indicate that G2LFormer exhibits excellent performance while keeping linear complexity.

翻译：图Transformer（GTs）在图表示学习领域展现出巨大潜力。其架构通常将图神经网络（GNNs）与全局注意力机制以并行方式或作为注意力机制的前置模块进行整合，形成局部-全局或局部到全局的注意力机制。然而，由于全局注意力机制主要捕获节点间的长程依赖关系，这些整合方案可能面临信息损失问题——即GNN所学习的局部邻域信息可能被注意力机制稀释。为此，我们提出G2LFormer模型，其采用新颖的全局到局部注意力机制：浅层网络使用注意力机制捕获全局信息，深层网络则采用GNN模块学习局部结构信息，从而防止节点忽略其直接邻域。我们引入了一种有效的跨层信息融合策略，使局部层能够保留来自全局层的有效信息并缓解信息损失，同时在可扩展性方面保持可接受的权衡。为验证全局到局部注意力机制的可行性，我们在节点级和图级任务上将G2LFormer与最先进的线性GTs及GNNs进行对比。实验结果表明，G2LFormer在保持线性复杂度的同时展现出卓越的性能。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

「图Transformers」综述

专知会员服务

28+阅读 · 2024年7月16日

【ICML2024】少即是多：论图Transformers的过度全局化问题

专知会员服务

23+阅读 · 2024年5月12日

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【芝加博士论文】图表示学习，图上的深度生成模型，组等变分子神经网络和多分辨率机器学习

专知会员服务

33+阅读 · 2022年11月5日