State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
翻译:具有选择机制和硬件感知架构的状态空间模型(SSMs),即Mamba,近年来在长序列建模中展现出显著潜力。由于Transformer中的自注意力机制与图像尺寸呈二次复杂度且计算需求不断增长,研究人员正探索如何将Mamba适配至计算机视觉任务。本文是首篇旨在对计算机视觉领域Mamba模型进行深入分析的综合综述。首先探讨了促成Mamba成功的核心概念,包括状态空间模型框架、选择机制和硬件感知设计。随后,通过将视觉Mamba模型分为基础模型与采用卷积、循环和注意力等增强技术提升复杂度的进阶模型进行综述。进一步深入探讨了Mamba在视觉任务中的广泛应用,涵盖其作为骨干网络在多层次视觉处理中的应用:包括通用视觉任务、医学视觉任务(如2D/3D分割、分类和图像配准等)及遥感视觉任务。特别从高层/中层视觉(如目标检测、分割、视频分类等)和低层视觉(如图像超分辨率、图像恢复、视觉生成等)两个层面介绍了通用视觉任务。我们期望这项工作能激发学界进一步应对当前挑战,推动Mamba模型在计算机视觉领域的更多应用。