Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.
翻译:近年来,基于线性状态空间模型(SSMs)的循环模型在语言建模(LM)中展现出与Transformer相竞争的良好性能。然而,对此类模型内在能力的基本理解尚不充分,而这种理解可为寻找更好的语言建模架构提供有益指导。本文从理论角度全面研究了此类SSMs的能力,并与Transformer及传统RNNs进行了系统比较。研究发现,SSMs与Transformer具有重叠但各有侧重的优势。在无星号状态跟踪任务中,SSMs能够对Transformer难以精确表示的问题提供直接且精确的解决方案。即使无需模拟堆栈结构,它们也能以最优内存效率建模有界层次结构。另一方面,我们指出了当前SSMs中一项限制其表达能力的设计选择。本文讨论了该发现对SSM与LM研究的启示,并在近期提出的SSM模型Mamba上进行了实证验证。