Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on \textit{every} edge. To overcome this, we introduce \textbf{E2Former-V2}, a scalable architecture that integrates algebraic sparsity with hardware-aware execution. We first propose \textbf{E}quivariant \textbf{A}xis-\textbf{A}ligned \textbf{S}parsification (EAAS). EAAS builds on Wigner-$6j$ convolution by exploiting an $\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$ change of basis to transform computationally expensive dense tensor contractions into efficient, sparse parity re-indexing operations. Building on this representation, we introduce \textbf{On-the-Fly Equivariant Attention}, a fully node-centric mechanism implemented via a custom fused Triton kernel. By eliminating materialized edge tensors and maximizing SRAM utilization, our kernel achieves a \textbf{20$\times$ improvement in TFLOPS} compared to standard implementations. Extensive experiments on the SPICE and OMol25 datasets demonstrate that E2Former-V2 maintains comparable predictive performance while notably accelerating inference. This work demonstrates that large equivariant transformers can be trained efficiently using widely accessible GPU platforms. The code is avalible at https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2.
翻译:等变图神经网络已成为建模三维原子系统的广泛使用的方法。然而,主流架构由于在*每条*边上显式构建几何特征或执行密集张量积,面临着关键的可扩展性瓶颈。为克服此问题,我们引入了**E2Former-V2**,一种将代数稀疏性与硬件感知执行相结合的可扩展架构。我们首先提出**等变轴对齐稀疏化**。EAAS基于Wigner-$6j$卷积,通过利用$\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$的基变换,将计算成本高昂的密集张量收缩转化为高效的稀疏奇偶性重索引操作。基于此表示,我们引入了**即时等变注意力**,这是一种完全以节点为中心的机制,通过定制的融合Triton内核实现。通过消除物化的边张量并最大化SRAM利用率,我们的内核相比标准实现实现了**20倍的TFLOPS提升**。在SPICE和OMol25数据集上的大量实验表明,E2Former-V2在保持可比预测性能的同时,显著加速了推理。这项工作表明,大型等变Transformer可以使用广泛可访问的GPU平台进行高效训练。代码可在https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2获取。