Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.
翻译:图变换器(GTs),如SAN和GPS,是将消息传递图神经网络(MPGNNs)与全局自注意力相结合的图处理模型。这些模型已被证明是通用函数逼近器,但有两个保留条件:1. 初始节点特征必须增加特定的位置编码。2. 逼近是非均匀的:不同大小的图可能需要不同的逼近网络。我们首先阐明这种普适性并非GTs独有:使用相同的位置编码,纯MPGNN甚至两层MLP也是非均匀通用逼近器。随后我们考虑均匀表达能力:目标函数需由单个网络逼近所有大小的图。在此,我们比较GTs与更高效的MPGNN+虚拟节点架构。两种模型定义的本质差异在于其全局计算方法——自注意力 vs. 虚拟节点。我们证明两种模型均非均匀通用逼近器,进而证明我们的主要结论:其中任何一个模型的均匀表达能力都不能包含另一个模型。我们通过合成数据实验验证了这一理论。进一步利用真实数据集扩展研究,观察到混合结果,表明实践中同样不存在明确的优劣排序。