State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models.
翻译:状态空间模型(SSMs)在许多长序列建模任务中展现出卓越的实证性能,但其理论理解仍显不足。本研究通过分析线性SSMs的学习动态,探讨数据协方差结构、潜在状态维度及参数初始化如何影响梯度下降学习过程中的参数演化。我们证明,在温和假设下聚焦于频域中的学习动态可获得解析解,并建立了一维SSMs与深度线性前馈网络动态之间的理论联系。最后,我们分析了潜在状态过参数化对收敛时间的影响,并展望了将研究结果拓展至非线性连接的深度SSMs的未来工作。本研究是构建深度状态空间模型学习动态理论的重要一步。