Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have emerged as powerful tools for language modeling, offering high performance with efficient inference and linear scaling in sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely unexplored. This paper aims to systematically study two key questions: (i) How do existing PEFT methods perform on SSM-based models? (ii) Which modules are most effective for fine-tuning? We conduct an empirical benchmark of four basic PEFT methods on SSM-based models. Our findings reveal that prompt-based methods (e.g., prefix-tuning) are no longer effective, an empirical result further supported by theoretical analysis. In contrast, LoRA remains effective for SSM-based models. We further investigate the optimal application of LoRA within these models, demonstrating both theoretically and experimentally that applying LoRA to linear projection matrices without modifying SSM modules yields the best results, as LoRA is not effective at tuning SSM modules. To further improve performance, we introduce LoRA with Selective Dimension tuning (SDLoRA), which selectively updates certain channels and states on SSM modules while applying LoRA to linear projection matrices. Extensive experimental results show that this approach outperforms standard LoRA.
翻译:深度状态空间模型(SSMs),如Mamba(Gu & Dao, 2024),已成为语言建模的强大工具,在推理高效且序列长度呈线性扩展的同时提供了高性能。然而,将高效参数微调(PEFT)方法应用于基于SSM的模型在很大程度上仍未得到探索。本文旨在系统性地研究两个关键问题:(i)现有的PEFT方法在基于SSM的模型上表现如何?(ii)哪些模块对于微调最为有效?我们在基于SSM的模型上对四种基础PEFT方法进行了实证基准测试。我们的研究结果表明,基于提示的方法(例如前缀调优)不再有效,这一实证结果得到了理论分析的进一步支持。相比之下,LoRA对于基于SSM的模型仍然有效。我们进一步研究了LoRA在这些模型中的最佳应用方式,从理论和实验上证明,在不修改SSM模块的情况下将LoRA应用于线性投影矩阵能获得最佳结果,因为LoRA在调优SSM模块方面效果不佳。为了进一步提升性能,我们引入了具有选择性维度调优的LoRA(SDLoRA),该方法在将LoRA应用于线性投影矩阵的同时,选择性地更新SSM模块的特定通道和状态。大量的实验结果表明,该方法优于标准的LoRA。