Graph Transformers typically rely on explicit positional or structural encodings and dense global attention to incorporate graph topology. In this work, we show that neither is essential. We introduce HopFormer, a graph Transformer that injects structure exclusively through head-specific n-hop masked sparse attention, without the use of positional encodings or architectural modifications. This design provides explicit and interpretable control over receptive fields while enabling genuinely sparse attention whose computational cost scales linearly with mask sparsity. Through extensive experiments on both node-level and graph-level benchmarks, we demonstrate that our approach achieves competitive or superior performance across diverse graph structures. Our results further reveal that dense global attention is often unnecessary: on graphs with strong small-world properties, localized attention yields more stable and consistently high performance, while on graphs with weaker small-world effects, global attention offers diminishing returns. Together, these findings challenge prevailing assumptions in graph Transformer design and highlight sparsity-controlled attention as a principled and efficient alternative.
翻译:图Transformer通常依赖显式位置或结构编码以及密集的全局注意力来整合图拓扑。本研究表明,这两者均非必需。我们提出了HopFormer,这是一种仅通过特定头部的n跳掩码稀疏注意力注入结构信息的图Transformer,无需使用位置编码或架构修改。该设计提供了对感受野的显式且可解释的控制,同时实现了真正稀疏的注意力机制,其计算成本随掩码稀疏度线性增长。通过在节点级和图级基准测试上的大量实验,我们证明了该方法在多种图结构上均能取得竞争性或更优的性能。我们的结果进一步揭示,密集的全局注意力通常并非必要:在具有强小世界特性的图上,局部注意力能产生更稳定且持续优异的表现;而在小世界效应较弱的图上,全局注意力带来的收益逐渐递减。这些发现共同挑战了当前图Transformer设计中的主流假设,并凸显了稀疏可控注意力作为一种原则性高效替代方案的价值。