We study the expressive power and limitations of multi-layer state-space models (SSMs). First, we show that multi-layer SSMs face fundamental limitations in compositional tasks, revealing an inherent gap between SSMs and streaming models. Then, we examine the role of chain-of-thought (CoT), showing that offline CoT does not fundamentally increase the expressiveness, while online CoT can substantially increase its power. Indeed, with online CoT, multi-layer SSMs become equivalent in power to streaming algorithms. Finally, we investigate the tradeoff between width and precision, showing that these resources are not interchangeable in the base model, but admit a clean equivalence once online CoT is allowed. Overall, our results offer a unified perspective on how depth, finite precision, and CoT shape the power and limits of SSMs.
翻译:我们研究了多层状态空间模型(SSMs)的表达能力及其局限性。首先,我们证明多层SSMs在处理组合任务时面临根本性局限,揭示了SSMs与流式模型之间的固有差距。接着,我们探讨了思维链(CoT)的作用,表明离线CoT无法根本性提升表达力,而在线CoT能显著增强其能力。事实上,引入在线CoT后,多层SSMs在能力上等价于流式算法。最后,我们研究了宽度与精度之间的权衡,发现这些资源在基础模型中不可互换,但一旦允许在线CoT,二者便呈现清晰的等价关系。整体而言,我们的结果为深度、有限精度与CoT如何塑造SSMs的能力与边界提供了统一视角。