Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension, resulting in the under-utilization of spatial information. Moreover, transformers are known to be data-hungry and their performance relies heavily on large-scale pretraining, which is challenging due to limited annotated hyperspectral data. Therefore, the full potential of HSI transformers has not been fully realized. To overcome these limitations, we propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures, leading to significant improvements in performance. The factorization of the inputs allows the spectral and spatial transformers to better capture the interactions within the hyperspectral data cubes. Inspired by masked image modeling pretraining, we also devise efficient masking strategies for pretraining each of the spectral and spatial transformers. We conduct experiments on six publicly available datasets for HSI classification task and demonstrate that our model achieves state-of-the-art performance in all the datasets. The code for our model will be made available at https://github.com/csiro-robotics/factoformer.
翻译:高光谱图像(HSIs)蕴含着丰富的光谱与空间信息。鉴于Transformer在自然语言处理和计算机视觉领域成功展示了学习输入数据长距离依赖关系的能力,近年来的研究开始探索将Transformer应用于HSI分析。然而,当前最先进的高光谱Transformer仅沿光谱维度对输入HSI样本进行令牌化处理,导致空间信息未被充分利用。此外,Transformer本身存在数据依赖性强的缺陷,其性能高度依赖于大规模预训练,而标注高光谱数据的稀缺性使得这一过程充满挑战。因此,HSI Transformer的全部潜力尚未得到充分发挥。为突破这些局限,我们提出一种新颖的分解式光谱-空间Transformer,并引入分解式自监督预训练流程,从而显著提升模型性能。通过将输入进行分解处理,光谱与空间Transformer能够更有效地捕捉高光谱数据立方体内部的相互作用。受掩码图像建模预训练的启发,我们设计了一套针对光谱和空间Transformer的高效掩码策略,以实现各自的预训练。我们在六个公开数据集上开展了HSI分类任务的实验,结果表明我们的模型在所有数据集上均达到了最先进的性能水平。相关模型代码将开放获取于https://github.com/csiro-robotics/factoformer。