Parameter-Efficient Fine-Tuning of State Space Models

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.

翻译：深度状态空间模型（SSMs），例如Mamba（Gu & Dao, 2024），已成为语言建模的强大工具，具有高性能和随序列长度线性扩展的特性。然而，将参数高效微调（PEFT）方法应用于基于SSM的模型仍未被充分探索。我们首先研究了关于现有PEFT方法的两个基本问题：（i）它们在基于SSM的模型上表现如何？（ii）为了获得最佳结果，它们应针对哪些参数？我们的分析表明，LoRA及其变体始终优于所有其他PEFT方法。虽然LoRA对线性投影矩阵有效，但它在SSM模块上却失效——尽管如此，它仍优于其他适用于SSMs的方法，这揭示了这些方法的局限性。这凸显了需要一种专门的SSM调优方法。为了解决这个问题，我们提出了稀疏维度调优（SDT），一种专为SSM模块设计的PEFT方法。将用于SSMs的SDT与用于线性投影矩阵的LoRA相结合，我们在广泛的实验中实现了最先进的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日