There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.
翻译:近年来,基于Transformer的图学习架构激增,主要动机在于注意力作为一种有效的学习机制,以及希望取代消息传递方案中手工设计的算子。然而,人们对其经验有效性、可扩展性以及预处理步骤的复杂性提出了担忧,特别是相较于通常在各种基准测试中表现相当的更简单的图神经网络。为解决这些不足,我们将图视为边的集合,并提出一种纯基于注意力的方法,该方法由编码器和注意力池化机制组成。编码器通过垂直交错掩码自注意力模块与普通自注意力模块来学习边的有效表示,同时能够处理输入图中可能存在的错误设定。尽管方法简单,该方案在超过70个节点级和图级任务(包括具有挑战性的长程基准测试)上均优于经过微调的消息传递基线方法及近期提出的基于Transformer的方法。此外,我们在从分子图到视觉图以及异配性节点分类等不同任务中均展示了最先进的性能。该方法在迁移学习设置中也优于图神经网络和Transformer模型,并且在相似性能水平或表达能力下,其可扩展性远优于其他替代方案。