A Survey of Mamba - 专知论文

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models (SSMs), has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first review the foundational knowledge of various representative deep learning models and the details of Mamba-1&2 as preliminaries. Then, to showcase the significance of Mamba for AI, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present a discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

翻译：作为最具代表性的深度学习技术之一，Transformer架构赋能了众多先进模型，尤其是包含数十亿参数的大型语言模型，已成为深度学习的基石。尽管取得了令人瞩目的成就，Transformer仍面临固有的局限性，特别是注意力计算二次计算复杂度导致的耗时推理问题。近期，一种受经典状态空间模型启发、名为Mamba的新型架构，已成为构建基础模型的有前景替代方案，在保持序列长度近线性可扩展性的同时，提供了与Transformer相当的建模能力。这引发了越来越多研究积极探索Mamba在不同领域实现卓越性能的潜力。鉴于其快速发展，亟需一份系统性综述来整合现有基于Mamba的模型，以提供对这一新兴模型架构的全面理解。为此，本综述对近期Mamba相关研究进行了深入调研，涵盖三个主要方面：基于Mamba的模型进展、Mamba适应不同数据的技术，以及Mamba表现出色的应用领域。具体而言，我们首先回顾了各类代表性深度学习模型的基础知识，以及Mamba-1和Mamba-2的细节作为预备知识。随后，为展示Mamba对人工智能的重要意义，我们全面综述了聚焦于Mamba模型架构设计、数据适应性和应用的相关研究。最后，我们讨论了当前局限性，并探讨了多个有前景的研究方向，为未来研究提供更深入的见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日