Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer.
翻译:图变换器因其能够出色地捕捉节点间的长程依赖关系,在学习图结构数据方面引起了广泛关注。然而,其二次空间和时间复杂度限制了图变换器的可扩展性,尤其是在大规模推荐场景中。本文提出了一种高效的掩码图变换器——MGFormer,能够以线性复杂度捕捉节点间的全对交互。为实现这一目标,我们将所有用户/物品节点视为独立令牌,通过位置嵌入增强其特征,并将其输入至核化注意力模块。此外,我们引入了可学习的相对度信息以适当重新加权注意力。实验结果表明,即使仅使用单层注意力,我们的MGFormer仍展现出优越性能。