In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling. By training on a small amount of data, the deep learning model can achieve almost perfect accuracy. However, in our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information. On non-overlapping dataset partitions, well-performing models suffer significant performance degradation. To this end, we propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions. SaaFormer applies a multi-level spectral extraction structure to segment the spectrum into multiple spectrum clips, such that the wavelength continuity of the spectrum across the channel are preserved. For each spectrum clip, the axial aggregation attention mechanism, which integrates spatial features along multiple spectral axes is applied to mine the spectral characteristic. The multi-level spectral extraction and the axial aggregation attention emphasize spectral characteristic to improve the model generalization. The experimental results on five publicly available datasets demonstrate that our model exhibits comparable performance on the random partition, while significantly outperforming other methods on non-overlapping partitions. Moreover, SaaFormer shows excellent performance on background classification.
翻译:在高光谱图像分类任务中,最常用的模型验证范式是通过像素级随机采样划分训练-测试数据集。通过在少量数据上进行训练,深度学习模型可以达到近乎完美的准确率。然而,在我们的实验中,我们发现高准确率的实现是因为训练集与测试集之间存在大量信息重叠。在非重叠的数据集划分上,表现优异的模型会出现显著的性能下降。为此,我们提出了一种光谱-空间轴向聚合Transformer模型(命名为SaaFormer),该模型能够保持跨数据集划分的泛化能力。SaaFormer采用多级光谱提取结构将光谱分割为多个光谱片段,从而保留跨通道的光谱波长连续性。对于每个光谱片段,采用沿多个光谱轴整合空间特征的轴向聚合注意力机制来挖掘光谱特性。多级光谱提取与轴向聚合注意力机制通过强化光谱特征来提升模型泛化能力。在五个公开数据集上的实验结果表明,我们的模型在随机划分上表现出与其他方法相当的性能,同时在非重叠划分上显著优于其他方法。此外,SaaFormer在背景分类任务中展现出优异性能。