Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. \footnote{Our code is publicly available at \url{https://github.com/LUMIA-Group/FourierTransformer}}

翻译：Transformer模型因其自注意力模块在序列长度上具有平方级的时间和空间复杂度，被认为计算需求极高，尤其在处理长序列时成本高昂。许多研究者致力于设计新型自注意力机制或引入新参数以克服这一局限，但其中大部分方法不允许模型继承大规模预训练模型的权重。本研究从另一角度解决Transformer的低效问题。我们提出傅立叶变换器（Fourier Transformer），这是一种简洁而有效的方法，通过利用现成的快速傅立叶变换（FFT）算子执行离散余弦变换（DCT），逐步移除隐藏序列中的冗余信息。傅立叶变换器能够显著降低计算成本，同时保留从各类大规模预训练模型继承权重的能力。实验表明，在长程建模基准LRA上，我们的模型在所有基于Transformer的模型中达到了最先进的性能，并在速度和空间方面均有显著提升。对于生成式序列到序列任务（包括CNN/DailyMail和ELI5），通过继承BART权重，我们的模型优于标准BART及其他高效模型。\footnote{我们的代码已公开于\url{https://github.com/LUMIA-Group/FourierTransformer}}

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日