CAT：用于次二次Transformer的循环卷积注意力机制 (CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers)

Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes O(N^2) complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully connected layers, and introduces no additional heavy operations, resulting in consistent accuracy improvements and about a 10% speedup in naive PyTorch implementations. Based on the Engineering-Isomorphic Transformers (EITs) framework, CAT's design not only offers practical efficiency and ease of implementation, but also provides insights to guide the development of future high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT's success, shedding light on broader principles for scalable attention mechanisms.

翻译：Transformer在自然语言处理和计算机视觉领域取得了显著突破，但其标准注意力机制仍具有O(N^2)的复杂度，限制了其向更长序列的扩展能力。本文提出循环卷积注意力机制（CAT），这是一种基于傅里叶变换的方法，通过高效应用循环卷积来降低计算复杂度，同时不牺牲表征能力。CAT实现了O(NlogN)的计算复杂度，通过精简全连接层减少了可学习参数数量，且未引入额外的繁重运算，从而在原始PyTorch实现中获得了约10%的速度提升和持续稳定的精度改进。基于工程同构Transformer（EITs）框架，CAT的设计不仅提供了实际的高效性和易实现性，还为未来高性能Transformer架构的发展提供了理论指导。最后，我们的消融实验揭示了CAT成功的关键条件，为可扩展注意力机制的更广泛设计原则提供了新的见解。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【TPAMI2023】PSLT：一种带有梯形自注意力和逐步位移的轻量级视觉Transformer

专知会员服务

26+阅读 · 2023年9月4日

【CVPR2023】BiFormer:基于双层路由注意力的视觉Transformer

专知会员服务

35+阅读 · 2023年3月20日