Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at https://github.com/luis-mueller/towards-principled-gts
翻译:基于k维Weisfeiler-Leman(k-WL)层次结构的图学习架构提供了理论上可严格描述的表示能力。然而,此类架构在实际任务中往往无法提供可靠的预测性能,从而限制了其实用价值。相比之下,基于全局注意力机制的模型(如图Transformer)在实践中表现出卓越性能,但将其表示能力与k-WL层次结构进行比较仍具挑战性——特别是由于这些架构依赖位置编码或结构编码来实现其表示能力和预测性能。为此,我们证明最近提出的Edge Transformer(一种在节点对而非节点上进行操作的全局注意力模型)至少具备3-WL表示能力。实验表明,在不依赖位置或结构编码的情况下,Edge Transformer在预测性能方面超越了其他理论对齐的架构。我们的代码公开于https://github.com/luis-mueller/towards-principled-gts。