A Survey of Mamba - 专知论文

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models (SSMs), has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first review the foundational knowledge of various representative deep learning models and the details of Mamba-1&2 as preliminaries. Then, to showcase the significance of Mamba for AI, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present a discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

翻译：作为最具代表性的深度学习技术之一，Transformer架构已赋能众多先进模型，尤其是包含数十亿参数的大型语言模型（LLMs），成为深度学习的基石。尽管取得了令人瞩目的成就，Transformer仍面临固有局限，特别是注意力计算二次计算复杂度导致的耗时推理问题。近期，一种受经典状态空间模型（SSMs）启发的新型架构Mamba崭露头角，成为构建基础模型的有力替代方案，在保持序列长度近线性扩展性的同时，提供了与Transformer相媲美的建模能力。这引发了越来越多研究积极探索Mamba在不同领域实现卓越性能的潜力。鉴于其快速发展，亟需一份系统性综述来整合现有基于Mamba的模型，为这一新兴模型架构提供全面理解。本综述因此对近期Mamba相关研究展开深入调研，涵盖三个主要方面：基于Mamba模型的进展、使Mamba适应不同数据的技术，以及Mamba能发挥优势的应用领域。具体而言，我们首先回顾各类代表性深度学习模型的基础知识，以及Mamba-1和Mamba-2的细节作为预备知识。随后，为展示Mamba对人工智能的重要意义，我们全面梳理了聚焦于Mamba模型架构设计、数据适应性和应用场景的相关研究。最后，我们讨论了当前局限性，并探讨了多个具有前景的研究方向，为未来研究提供更深入的见解。