Parameter-Efficient Fine-Tuning of State Space Models

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have emerged as powerful tools for language modeling, offering high performance with efficient inference and linear scaling in sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely unexplored. This paper aims to systematically study two key questions: (i) How do existing PEFT methods perform on SSM-based models? (ii) Which modules are most effective for fine-tuning? We conduct an empirical benchmark of four basic PEFT methods on SSM-based models. Our findings reveal that prompt-based methods (e.g., prefix-tuning) are no longer effective, an empirical result further supported by theoretical analysis. In contrast, LoRA remains effective for SSM-based models. We further investigate the optimal application of LoRA within these models, demonstrating both theoretically and experimentally that applying LoRA to linear projection matrices without modifying SSM modules yields the best results, as LoRA is not effective at tuning SSM modules. To further improve performance, we introduce LoRA with Selective Dimension tuning (SDLoRA), which selectively updates certain channels and states on SSM modules while applying LoRA to linear projection matrices. Extensive experimental results show that this approach outperforms standard LoRA.

翻译：深度状态空间模型（SSMs），如Mamba（Gu & Dao, 2024），已成为语言建模的强大工具，在推理高效且序列长度呈线性扩展的同时提供了高性能。然而，将高效参数微调（PEFT）方法应用于基于SSM的模型在很大程度上仍未得到探索。本文旨在系统性地研究两个关键问题：（i）现有的PEFT方法在基于SSM的模型上表现如何？（ii）哪些模块对于微调最为有效？我们在基于SSM的模型上对四种基础PEFT方法进行了实证基准测试。我们的研究结果表明，基于提示的方法（例如前缀调优）不再有效，这一实证结果得到了理论分析的进一步支持。相比之下，LoRA对于基于SSM的模型仍然有效。我们进一步研究了LoRA在这些模型中的最佳应用方式，从理论和实验上证明，在不修改SSM模块的情况下将LoRA应用于线性投影矩阵能获得最佳结果，因为LoRA在调优SSM模块方面效果不佳。为了进一步提升性能，我们引入了具有选择性维度调优的LoRA（SDLoRA），该方法在将LoRA应用于线性投影矩阵的同时，选择性地更新SSM模块的特定通道和状态。大量的实验结果表明，该方法优于标准的LoRA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日