Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow. In this work, we explore whether more layers are beneficial to graph transformers, and find that current graph transformers suffer from the bottleneck of improving performance by increasing depth. Our further analysis reveals the reason is that deep graph transformers are limited by the vanishing capacity of global attention, restricting the graph transformer from focusing on the critical substructure and obtaining expressive features. To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. Our model enhances the ability of the global attention to focus on substructures and promotes the expressiveness of the representations, addressing the limitation of self-attention as the graph transformer deepens. Experiments show that our method unblocks the depth limitation of graph transformers and results in state-of-the-art performance across various graph benchmarks with deeper models.
翻译:尽管深层结构在许多神经架构中已被证明是成功的,但现有的图Transformer相对较浅。本研究探讨了增加层数是否有利于图Transformer,并发现当前图Transformer存在通过增加深度提升性能的瓶颈。进一步分析揭示,深层图Transformer受限于全局注意力的容量衰减问题,这限制了图Transformer聚焦关键子结构并获取富有表达力的特征。为此,我们提出一种名为DeepGraph的新型图Transformer模型,该模型在编码表示中显式嵌入子结构令牌,并对相关节点应用局部注意力以获取基于子结构的注意力编码。我们的模型增强了全局注意力聚焦子结构的能力,提升了表示的丰富表达能力,从而解决了自注意力随图Transformer加深而出现的局限性。实验表明,我们的方法突破了图Transformer的深度限制,在多种图基准测试中通过更深层模型实现了最先进的性能。