In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
翻译:在后深度学习时代,Transformer架构在预训练大模型及各类下游任务中展现了强大的性能。然而,该架构巨大的计算需求令众多研究者望而却步。为了进一步降低注意力模型的复杂度,研究者们付出了大量努力,设计了更高效的方法。其中,状态空间模型(State Space Model, SSM)作为基于自注意力机制的Transformer模型的潜在替代方案,近年来受到了越来越多的关注。本文首次全面综述了相关研究,并通过实验比较与分析,更好地展示了SSM的特征与优势。具体而言,我们首先详细阐述了原理,帮助读者快速把握SSM的核心思想。随后,深入回顾了现有的SSM及其在自然语言处理、计算机视觉、图数据、多模态与多媒体、点云/事件流、时间序列数据及其他领域的广泛应用。此外,我们提供了这些模型的统计比较与分析,期望有助于读者理解不同结构在各个任务上的有效性。接着,我们提出了该方向可能的研究点,以更好地推动SSM理论模型与应用的发展。更多相关工作将持续更新于以下GitHub仓库:https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List。