Recently, some researchers started exploring the use of ViTs in tackling HSI classification and achieved remarkable results. However, the training of ViT models requires a considerable number of training samples, while hyperspectral data, due to its high annotation costs, typically has a relatively small number of training samples. This contradiction has not been effectively addressed. In this paper, aiming to solve this problem, we propose the single-direction tuning (SDT) strategy, which serves as a bridge, allowing us to leverage existing labeled HSI datasets even RGB datasets to enhance the performance on new HSI datasets with limited samples. The proposed SDT inherits the idea of prompt tuning, aiming to reuse pre-trained models with minimal modifications for adaptation to new tasks. But unlike prompt tuning, SDT is custom-designed to accommodate the characteristics of HSIs. The proposed SDT utilizes a parallel architecture, an asynchronous cold-hot gradient update strategy, and unidirectional interaction. It aims to fully harness the potent representation learning capabilities derived from training on heterologous, even cross-modal datasets. In addition, we also introduce a novel Triplet-structured transformer (Tri-Former), where spectral attention and spatial attention modules are merged in parallel to construct the token mixing component for reducing computation cost and a 3D convolution-based channel mixer module is integrated to enhance stability and keep structure information. Comparison experiments conducted on three representative HSI datasets captured by different sensors demonstrate the proposed Tri-Former achieves better performance compared to several state-of-the-art methods. Homologous, heterologous and cross-modal tuning experiments verified the effectiveness of the proposed SDT.
翻译:最近,一些研究者开始探索使用视觉Transformer(ViT)处理高光谱图像(HSI)分类任务,并取得了显著成果。然而,ViT模型的训练需要大量训练样本,而高光谱数据因标注成本高昂,通常仅有较少的训练样本。这一矛盾尚未得到有效解决。为解决该问题,本文提出单方向微调(SDT)策略,该策略可作为桥梁,利用现有标注的HSI数据集甚至RGB数据集,提升新HSI数据集在样本有限情况下的性能。所提出的SDT继承了提示微调的思想,旨在以最小修改复用预训练模型以适应新任务。但与提示微调不同,SDT针对HSI特性进行了定制化设计。所提出的SDT采用并行架构、异步冷热梯度更新策略以及单向交互,旨在充分利用从异源甚至跨模态数据集训练中获得的强大表征学习能力。此外,我们还引入了一种新颖的三重结构Transformer(Tri-Former),将光谱注意力模块与空间注意力模块并行融合构成Token混合组件以降低计算开销,并集成基于3D卷积的通道混合模块以增强稳定性并保持结构信息。在三个由不同传感器采集的代表性HSI数据集上的对比实验表明,所提出的Tri-Former相比多种现有最优方法取得了更优性能。同源、异源和跨模态微调实验验证了所提SDT的有效性。