Fine-tuning Transformer-based foundation models has become the dominant strategy for domain adaptation in audio and speech processing. To reduce the computational and memory costs of this process, parameter-efficient transfer learning (PETL) methods have been widely explored. Meanwhile, Mamba, a recent state-space model, has emerged as a promising alternative to Transformers for sequence modeling. In this work, we present MambAdapter, a parameter-efficient transfer learning approach that integrates Mamba into low-rank bottleneck adapters. Our design combines parameter sharing across adapters with the injection of a lightweight Mamba module, enabling more effective modeling of audio features. We demonstrate that MambAdapter matches or outperforms strong PETL baselines on four audio classification tasks and five speech recognition languages, even when operating under reduced parameter budgets.
翻译:对基于Transformer的基础模型进行微调已成为音频与语音处理领域领域适应的主导策略。为降低该过程的计算与内存成本,参数高效迁移学习方法(PETL)已得到广泛探索。与此同时,近期提出的状态空间模型Mamba作为Transformer在序列建模中的替代方案崭露头角。本文提出MambAdapter,一种将Mamba集成到低秩瓶颈适配器中的参数高效迁移学习方法。我们的设计结合了跨适配器的参数共享与轻量级Mamba模块的注入,从而更有效地建模音频特征。实验表明,即使在缩减参数预算的条件下,MambAdapter在四项音频分类任务和五种语音识别语言上仍能匹配或超越强PETL基线方法。