Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.
翻译:深度状态空间模型(SSMs),例如Mamba(Gu & Dao, 2024),已成为语言建模的强大工具,具有高性能和随序列长度线性扩展的特性。然而,将参数高效微调(PEFT)方法应用于基于SSM的模型仍未被充分探索。我们首先研究了关于现有PEFT方法的两个基本问题:(i)它们在基于SSM的模型上表现如何?(ii)为了获得最佳结果,它们应针对哪些参数?我们的分析表明,LoRA及其变体始终优于所有其他PEFT方法。虽然LoRA对线性投影矩阵有效,但它在SSM模块上却失效——尽管如此,它仍优于其他适用于SSMs的方法,这揭示了这些方法的局限性。这凸显了需要一种专门的SSM调优方法。为了解决这个问题,我们提出了稀疏维度调优(SDT),一种专为SSM模块设计的PEFT方法。将用于SSMs的SDT与用于线性投影矩阵的LoRA相结合,我们在广泛的实验中实现了最先进的性能。