Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.
翻译:图Transformer凭借其全局注意力机制,已成为处理图结构数据的新工具。学界普遍认为,全局注意力机制在全连接图中考虑了更广泛的感受野,这使得许多人相信可以从所有节点中提取有用信息。本文挑战了这一观点:全局化特性是否始终有益于图Transformer?我们通过实证证据与理论分析揭示了图Transformer中存在的过度全局化问题,即当前注意力机制过度关注那些遥远节点,而实际上包含大部分有用信息的邻近节点却被相对弱化。为此,我们提出了一种具有协同训练的双层全局图Transformer(CoBFormer),包含簇间Transformer与簇内Transformer,在保持从远距离节点提取有价值信息能力的同时,有效防止过度全局化问题。此外,所提出的协同训练方法在理论保证下提升了模型的泛化能力。在多类图数据上的大量实验充分验证了我们所提出的CoBFormer的有效性。