Masked image modeling (MIM) is a highly popular and effective self-supervised learning method for image understanding. Existing MIM-based methods mostly focus on spatial feature modeling, neglecting spectral feature modeling. Meanwhile, existing MIM-based methods use Transformer for feature extraction, some local or high-frequency information may get lost. To this end, we propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification. Specifically, SS-MAE consists of a spatial-wise branch and a spectral-wise branch. The spatial-wise branch masks random patches and reconstructs missing pixels, while the spectral-wise branch masks random spectral channels and reconstructs missing channels. Our SS-MAE fully exploits the spatial and spectral representations of the input data. Furthermore, to complement local features in the training stage, we add two lightweight CNNs for feature extraction. Both global and local features are taken into account for feature modeling. To demonstrate the effectiveness of the proposed SS-MAE, we conduct extensive experiments on three publicly available datasets. Extensive experiments on three multi-source datasets verify the superiority of our SS-MAE compared with several state-of-the-art baselines. The source codes are available at \url{https://github.com/summitgao/SS-MAE}.
翻译:掩码图像建模(MIM)是一种在图像理解领域广泛流行且高效的自监督学习方法。现有基于MIM的方法大多侧重于空间特征建模,而忽略了光谱特征建模。同时,现有基于MIM的方法采用Transformer进行特征提取,可能导致部分局部或高频信息丢失。为此,我们提出了一种面向高光谱图像(HSI)与激光雷达(LiDAR)/合成孔径雷达(SAR)数据联合分类的空间-光谱掩码自编码器(SS-MAE)。具体而言,SS-MAE包含空间分支和光谱分支:空间分支对随机图像块进行掩码并重构缺失像素,而光谱分支则对随机光谱通道进行掩码并重构缺失通道。该模型充分挖掘了输入数据的空间与光谱表征。此外,为在训练阶段补全局部特征,我们引入了两个轻量级CNN进行特征提取,从而在特征建模中兼顾全局与局部特征。为验证所提SS-MAE的有效性,我们在三个公开数据集上开展了大量实验。基于这三个多源数据集的实验结果表明,与多个当前最优基线方法相比,SS-MAE具有显著优势。源代码已开源至 \url{https://github.com/summitgao/SS-MAE}。