State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.
翻译:状态空间模型(SSMs)利用线性时不变(LTI)系统来有效学习具有长程依赖关系的序列。通过分析LTI系统的传递函数,我们发现SSMs表现出一种隐式偏置,即其捕获低频分量的能力优于高频分量。这种行为与深度学习模型训练中更广泛的频率偏置概念相一致。我们证明,SSM的初始化过程赋予其固有的频率偏置,而采用常规方式训练模型并不会改变这种偏置。基于我们的理论,我们提出了两种调谐频率偏置的机制:一种是通过缩放初始化参数来调谐其固有的频率偏置;另一种是应用基于Sobolev范数的滤波器来调整梯度对高频输入的敏感性,从而使我们能够通过训练来改变频率偏置。利用图像去噪任务,我们通过实验证明,使用这两种机制可以增强、减弱甚至反转频率偏置。通过调谐频率偏置,我们还能提升SSMs在学习长程序列上的性能,在Long-Range Arena(LRA)基准测试任务中平均准确率达到88.26%。