Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at https://github.com/luis-mueller/towards-principled-gts
翻译:基于k维Weisfeiler-Leman(k-WL)层次结构的图学习架构具有理论上可充分理解的表达能力。然而,此类架构在实际任务中往往难以提供可靠的预测性能,从而限制了其实践影响力。相比之下,基于全局注意力机制的模型(如图Transformer)在实践中表现出卓越性能,但将其表达能力与k-WL层次结构进行比较仍具挑战性——特别是由于这些架构依赖位置编码或结构编码来实现其表达能力和预测性能。为此,我们证明最近提出的Edge Transformer(一种在节点对而非节点上操作的全局注意力模型)至少具备3-WL表达能力。实证研究表明,Edge Transformer在预测性能方面优于其他理论对齐的架构,且无需依赖位置或结构编码。我们的代码公开于https://github.com/luis-mueller/towards-principled-gts。