State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. To achieve state-of-the-art performance, an SSM often needs a specifically designed initialization, and the training of state matrices is on a logarithmic scale with a very small learning rate. To understand these choices from a unified perspective, we view SSMs through the lens of Hankel operator theory. Building upon it, we develop a new parameterization scheme, called HOPE, for LTI systems that utilizes Markov parameters within Hankel operators. Our approach helps improve the initialization and training stability, leading to a more robust parameterization. We efficiently implement these innovations by nonuniformly sampling the transfer functions of LTI systems, and they require fewer parameters compared to canonical SSMs. When benchmarked against HiPPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
翻译:利用线性时不变系统的状态空间模型在学习长序列方面表现出显著效能。为达到最优性能,此类模型通常需要专门设计的初始化方案,且状态矩阵的训练需采用对数尺度与极低学习率。为从统一视角理解这些设计选择,我们通过Hankel算子理论重新审视状态空间模型。基于该理论,我们提出了一种名为HOPE的新型参数化方案,该方案利用Hankel算子内的马尔可夫参数对线性时不变系统进行建模。该方法能有效改善初始化效果与训练稳定性,从而实现更鲁棒的参数化。我们通过非均匀采样线性时不变系统的传递函数高效实现了这些创新,且所需参数量少于经典状态空间模型。在Long-Range Arena基准测试中,采用Hankel算子参数化的状态空间模型相较于HiPPO初始化的S4、S4D等模型展现出更优性能。此外,新参数化方案使状态空间模型在固定时间窗口内具备非衰减记忆特性,这一特性在添加填充噪声的序列CIFAR-10任务中得到了实证验证。