Graph transformers are the state-of-the-art for learning from graph-structured data and are empirically known to avoid several pitfalls of message-passing architectures. However, there is limited theoretical analysis on why these models perform well in practice. In this work, we prove that attention-based architectures have structural benefits over graph convolutional networks in the context of node-level prediction tasks. Specifically, we study the neural network gaussian process limits of graph transformers (GAT, Graphormer, Specformer) with infinite width and infinite heads, and derive the node-level and edge-level kernels across the layers. Our results characterise how the node features and the graph structure propagate through the graph attention layers. As a specific example, we prove that graph transformers structurally preserve community information and maintain discriminative node representations even in deep layers, thereby preventing oversmoothing. We provide empirical evidence on synthetic and real-world graphs that validate our theoretical insights, such as integrating informative priors and positional encoding can improve performance of deep graph transformers.
翻译:图变换器是目前从图结构数据中学习的最先进模型,经验表明它们能避免消息传递架构的若干缺陷。然而,关于这些模型在实践中表现优异的理论分析仍然有限。在本工作中,我们证明在节点级预测任务中,基于注意力的架构相比图卷积网络具有结构优势。具体而言,我们研究了无限宽度和无限头数下图变换器(GAT、Graphormer、Specformer)的神经网络高斯过程极限,并推导了各层节点级和边级核函数。我们的结果刻画了节点特征和图结构如何通过图注意力层传播。作为特例,我们证明图变换器能在深层中结构性保留社区信息并保持具有判别力的节点表示,从而防止过平滑。我们在合成图与真实图上提供了实证证据,验证了我们的理论洞见,例如整合信息先验和位置编码可提升深层图变换器的性能。