SaaFormer: Spectral-spatial Axial Aggregation Transformer for Hyperspectral Image Classification

Hyperspectral images (HSI) captured from earth observing satellites and aircraft is becoming increasingly important for applications in agriculture, environmental monitoring, mining, etc. Due to the limited available hyperspectral datasets, the pixel-wise random sampling is the most commonly used training-test dataset partition approach, which has significant overlap between samples in training and test datasets. Furthermore, our experimental observations indicates that regions with larger overlap often exhibit higher classification accuracy. Consequently, the pixel-wise random sampling approach poses a risk of data leakage. Thus, we propose a block-wise sampling method to minimize the potential for data leakage. Our experimental findings also confirm the presence of data leakage in models such as 2DCNN. Further, We propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, to address the challenges associated with hyperspectral image classifier that considers HSI as long sequential three-dimensional images. The model comprises two primary components: axial aggregation attention and multi-level spectral-spatial extraction. The axial aggregation attention mechanism effectively exploits the continuity and correlation among spectral bands at each pixel position in hyperspectral images, while aggregating spatial dimension features. This enables SaaFormer to maintain high precision even under block-wise sampling. The multi-level spectral-spatial extraction structure is designed to capture the sensitivity of different material components to specific spectral bands, allowing the model to focus on a broader range of spectral details. The results on six publicly available datasets demonstrate that our model exhibits comparable performance when using random sampling, while significantly outperforming other methods when employing block-wise sampling partition.

翻译：从地球观测卫星和飞行器获取的高光谱图像在农业、环境监测、矿产勘探等领域的应用日益重要。由于可用高光谱数据集的局限性，像素级随机采样是最常用的训练-测试数据集划分方法，但该方法会导致训练集与测试集样本之间存在显著重叠。此外，我们的实验观察表明，重叠区域越大的样本通常分类精度越高。因此，像素级随机采样方法存在数据泄露风险。为此，我们提出一种分块采样方法以最小化数据泄露可能性，实验结果也证实了2DCNN等模型中存在数据泄露现象。进一步地，我们提出一种谱-空轴向聚合Transformer模型——SaaFormer，用于解决将高光谱图像视为长序列三维图像时的分类挑战。该模型包含两大核心组件：轴向聚合注意力机制与多层级谱-空特征提取。轴向聚合注意力机制有效利用高光谱图像中每个像素位置的光谱波段连续性与相关性，同时聚合空间维度的特征，使SaaFormer在分块采样条件下仍保持高精度。多层级谱-空提取结构则专门设计用于捕捉不同物质成分对特定光谱波段的敏感性，使模型能聚焦更广泛的光谱细节。在六个公开数据集上的实验结果表明：采用随机采样时，本模型性能与现有方法相当；而采用分块采样划分时，本模型显著优于其他方法。