序列模型的设计原则：基于系数动力学的视角 (Design Principles for Sequence Models via Coefficient Dynamics)

Deep sequence models, ranging from Transformers and State Space Models (SSMs) to more recent approaches such as gated linear RNNs, fundamentally compute outputs as linear combinations of past value vectors. To draw insights and systematically compare such architectures, we develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs. This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures and crucially captures softmax attention, on top of RNNs, SSMs, and related models. In contrast to new model proposals that are commonly evaluated on benchmarks, we derive design principles linking architectural choices to model properties. Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention. By connecting several insights and observations from recent literature, the framework both explains empirical successes of recent designs and provides guiding principles for systematically designing new sequence model architectures.

翻译：深度序列模型，从Transformer和状态空间模型（SSM）到近期诸如门控线性循环神经网络等更多方法，其核心都是将输出计算为过去数值向量的线性组合。为了深入理解并系统比较此类架构，我们开发了一个统一框架，通过将线性组合系数建模为由脉冲输入驱动的自治线性动力系统的输出，从而显式地表达这一输出操作。这一视角在本质上与关注线性循环神经网络与线性注意力之间联系的方法有显著不同，它揭示了跨多种架构的共同数学主题，并关键性地捕捉了在循环神经网络、状态空间模型及相关模型之上的softmax注意力机制。与通常仅在基准测试上评估的新模型提案不同，我们推导出将架构选择与模型特性联系起来的设计原则。由此，我们识别了表达能力与高效实现之间的权衡、输入选择性的几何约束，以及数值稳定训练与信息保持的稳定性条件。通过整合近期文献中的若干见解与观察，该框架既解释了近期设计在实证上的成功，也为系统设计新的序列模型架构提供了指导原则。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日