Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification

Hyperspectral image (HSI) classification is pivotal in the remote sensing (RS) field, particularly with the advancement of deep learning techniques. Sequential models, adapted from the natural language processing (NLP) field such as Recurrent Neural Networks (RNNs) and Transformers, have been tailored to this task, offering a unique viewpoint. However, several challenges persist 1) RNNs struggle with centric feature aggregation and are sensitive to interfering pixels, 2) Transformers require significant computational resources and often underperform with limited HSI training samples, and 3) Current scanning methods for converting images into sequence-data are simplistic and inefficient. In response, this study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. The MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), a Semantic Token Learner (STL), and a Semantic Token Fuser (STF) for enhanced feature generation and concentration, and 3) A Weighted MCS Fusion (WMF) module coupled with a Multi-Scale Loss Design to improve decoding efficiency. Experimental results from three public HSI datasets with fixed and disjoint training-testing samples demonstrate that our method outperforms existing baselines and state-of-the-art approaches, highlighting its efficacy and potential in HSI applications.

翻译：高光谱图像(HSI)分类是遥感(RS)领域的关键研究方向，尤其在深度学习技术不断发展的背景下。从自然语言处理(NLP)领域引入的序列模型（如循环神经网络RNNs和Transformer）已被针对该任务进行改造，提供了独特的解决视角。然而，现有方法仍面临以下挑战：1) RNNs难以实现中心化特征聚合，且对干扰像素敏感；2) Transformer需要大量计算资源，在HSI训练样本有限时往往性能不足；3) 当前将图像转换为序列数据的扫描方法过于简单且效率低下。针对这些问题，本研究创新性地提出了用于HSI分类的Mamba-in-Mamba(MiM)架构，这是首次将状态空间模型(SSM)应用于该任务。MiM模型包含：1) 一种新颖的集中式Mamba交叉扫描(MCS)机制用于将图像转换为序列数据；2) Tokenized Mamba(T-Mamba)编码器，集成了高斯衰减掩码(GDM)、语义标记学习器(STL)和语义标记融合器(STF)，以增强特征生成与聚焦能力；3) 结合多尺度损失设计的加权MCS融合(WMF)模块，用于提升解码效率。在三个公开HSI数据集上采用固定且非重叠的训练-测试样本进行的实验结果表明，本方法优于现有基线方法和最新技术，充分证明了其在HSI应用中的有效性和潜力。