Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency. The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA.
翻译:准确的高光谱图像(HSI)解译对于为城市规划、精准农业和环境监测等各种地球观测相关应用提供有价值的见解至关重要。然而,现有的高光谱图像处理方法大多是针对特定任务和场景设计的,这严重限制了它们在任务和场景间迁移知识的能力,从而降低了在实际应用中的实用性。为应对这些挑战,我们提出了HyperSIGMA,一个基于视觉Transformer的基础模型,它统一了跨任务和场景的高光谱图像解译,并可扩展至超过十亿参数。为克服高光谱图像固有的光谱和空间冗余,我们引入了一种新颖的稀疏采样注意力(SSA)机制,该机制有效促进了多样化上下文特征的学习,并作为HyperSIGMA的基本构建模块。HyperSIGMA通过专门设计的光谱增强模块整合空间和光谱特征。此外,我们构建了一个大规模高光谱数据集HyperGlobal-450K用于预训练,该数据集包含约45万张高光谱图像,在规模上显著超越了现有数据集。在各种高水平和低水平高光谱图像任务上进行的大量实验表明,与当前最先进的方法相比,HyperSIGMA具有多功能性和卓越的表征能力。此外,HyperSIGMA在可扩展性、鲁棒性、跨模态迁移能力、实际应用适用性和计算效率方面展现出显著优势。代码和模型将在 https://github.com/WHU-Sigma/HyperSIGMA 发布。