Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes O(N^2) complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully connected layers, and introduces no additional heavy operations, resulting in consistent accuracy improvements and about a 10% speedup in naive PyTorch implementations. Based on the Engineering-Isomorphic Transformers (EITs) framework, CAT's design not only offers practical efficiency and ease of implementation, but also provides insights to guide the development of future high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT's success, shedding light on broader principles for scalable attention mechanisms.
翻译:Transformer在自然语言处理和计算机视觉领域取得了显著突破,但其标准注意力机制仍具有O(N^2)的复杂度,限制了其向更长序列的扩展能力。本文提出循环卷积注意力机制(CAT),这是一种基于傅里叶变换的方法,通过高效应用循环卷积来降低计算复杂度,同时不牺牲表征能力。CAT实现了O(NlogN)的计算复杂度,通过精简全连接层减少了可学习参数数量,且未引入额外的繁重运算,从而在原始PyTorch实现中获得了约10%的速度提升和持续稳定的精度改进。基于工程同构Transformer(EITs)框架,CAT的设计不仅提供了实际的高效性和易实现性,还为未来高性能Transformer架构的发展提供了理论指导。最后,我们的消融实验揭示了CAT成功的关键条件,为可扩展注意力机制的更广泛设计原则提供了新的见解。