We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.
翻译:本文提出一种改进Transformer架构的方法,通过将图感知关系推理整合到注意力机制中,融合了图神经网络与语言建模的核心概念。基于注意力机制与图论之间的内在联系,我们将Transformer的注意力机制重新表述为图运算,并提出图感知同构注意力。该方法利用先进的图建模策略——包括图同构网络(GIN)和主邻域聚合(PNA)——来丰富关系结构的表征。我们的方法能够捕捉复杂依赖关系并在不同任务间实现泛化,具体表现为泛化间隙的缩小和学习性能的提升。此外,我们拓展了图感知注意力的概念,提出稀疏GIN注意力——一种采用稀疏GIN的微调方法。通过将注意力矩阵解释为稀疏邻接图,该技术能够以最小的计算开销增强预训练基础模型的自适应能力,使其具备图感知特性。与低秩自适应(LoRA)等方法相比,稀疏GIN注意力微调实现了更优的训练动态和更好的泛化性能。我们探讨了传统注意力机制中潜在的类图结构,为理解Transformer提供了新的视角。通过将Transformer演化为用于关系推理的层次化GIN模型,这一视角为基础模型发展带来了深远启示,使得能够设计出动态适应局部与全局依赖关系的架构。在生物信息学、材料科学、语言建模等领域的应用中,这种关系型与序列数据建模的融合将推动可解释、可泛化建模策略的发展。