Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.
翻译:近年来,基于线性状态空间模型(SSMs)的循环模型在语言建模(LM)中展现出与Transformer相竞争的性能。然而,对于此类模型在原理上的能力尚缺乏深入理解,而这种理解可为寻找更好的语言建模架构提供有益指导。本文从理论角度全面研究了此类SSMs的表达能力,并将其与Transformer及传统RNNs进行系统比较。研究发现,SSMs与Transformer具有重叠但各具特色的优势。在无星号状态跟踪任务中,SSMs能够对Transformer难以精确表示的问题提供直接而精确的解决方案。即使不模拟栈结构,它们也能以最优内存建模有界层次结构。另一方面,我们指出当前SSMs中存在一项限制其表达能力的设计选择。本文探讨了该发现对SSM与LM研究的启示,并在近期提出的SSM模型Mamba上进行了实证验证。