Toeplitz Neural Network for Sequence Modeling

from arxiv, Accepted to ICLR 2023 Spotlight. Yiran Zhong is the corresponding author. 15B pretrained LLM with TNN will be released at https://github.com/OpenNLPLab/Tnn soon

Sequence modeling has important applications in natural language processing and computer vision. Recently, the transformer-based models have shown strong performance on various sequence modeling tasks, which rely on attention to capture pairwise token relations, and position embedding to inject positional information. While showing good performance, the transformer models are inefficient to scale to long input sequences, mainly due to the quadratic space-time complexity of attention. To overcome this inefficiency, we propose to model sequences with a relative position encoded Toeplitz matrix and use a Toeplitz matrix-vector production trick to reduce the space-time complexity of the sequence modeling to log linear. A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters, enabling the proposed Toeplitz neural network to deal with varying sequence lengths. In addition, despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance. Extensive experiments on autoregressive and bidirectional language modeling, image modeling, and the challenging Long-Range Arena benchmark show that our method achieves better performance than its competitors in most downstream tasks while being significantly faster. The code is available at https://github.com/OpenNLPLab/Tnn.

翻译：序列建模在自然语言处理和计算机视觉中具有重要应用。近年来，基于Transformer的模型在各种序列建模任务中展现出强大性能，这些模型依赖注意力机制捕获成对标记之间的关系，并通过位置嵌入注入位置信息。尽管性能优异，但Transformer模型在扩展到长输入序列时效率低下，主要源于注意力机制二次方时空复杂度。为克服这一局限，我们提出使用相对位置编码的托普利茨矩阵进行序列建模，并利用托普利茨矩阵-向量乘积技巧将序列建模的时空复杂度降低至对数线性。我们设计了一个轻量子网络——相对位置编码器，通过固定参数预算生成相对位置系数，使所提出的托普利茨神经网络能够处理可变长度的序列。此外，尽管仅在512个标记的序列上训练，我们的模型在推理时可外推至14K个标记的输入序列长度，且性能保持一致。在自回归与双向语言建模、图像建模以及具有挑战性的Long-Range Arena基准上的大量实验表明，我们的方法在大多数下游任务中性能优于同类方法，同时速度显著更快。代码已开源在https://github.com/OpenNLPLab/Tnn。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日