Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) fixed receptive field representation overlooks effective contextual information; (2) redundant self-attention feature representation. To address these limitations, we propose a novel Selective Transformer (SFormer) for HSI classification. The SFormer is designed to dynamically select receptive fields for capturing both spatial and spectral contextual information, while mitigating the impact of redundant data by prioritizing the most relevant features. This enables a highly accurate classification of the land covers of the HSI. Specifically, a Kernel Selective Transformer Block (KSTB) is first utilized to dynamically select an appropriate receptive field range to effectively extract spatial-spectral features. Furthermore, to capture the most crucial tokens, a Token Selective Transformer Block (TSTB) is introduced, which selects the most relevant tokens based on the ranking of attention scores for each query. Extensive experiments on four benchmark HSI datasets demonstrate that the proposed SFormer outperforms the state-of-the-art HSI classification models. The codes will be released.
翻译:Transformer在高光谱图像(HSI)分类领域已取得令人满意的成果。然而,现有Transformer模型在处理具有多样化土地覆盖类型和丰富光谱信息的高光谱图像场景时面临两个关键挑战:(1)固定感受野的表征忽略了有效的上下文信息;(2)冗余的自注意力特征表征。为应对这些局限性,我们提出了一种用于高光谱图像分类的新型选择性Transformer(SFormer)。SFormer旨在动态选择感受野以捕获空间和光谱上下文信息,同时通过优先处理最相关的特征来减轻冗余数据的影响,从而实现对高光谱图像土地覆盖类型的高精度分类。具体而言,首先利用核选择性Transformer模块(KSTB)动态选择合适的感受野范围,以有效提取空间-光谱特征。此外,为捕获最关键的特征标记,引入了标记选择性Transformer模块(TSTB),该模块根据每个查询的注意力分数排名选择最相关的标记。在四个基准高光谱图像数据集上的大量实验表明,所提出的SFormer模型优于当前最先进的高光谱图像分类模型。相关代码将予以公开。