Self-Supervised In-Domain Representation Learning for Remote Sensing Image Scene Classification

Transferring the ImageNet pre-trained weights to the various remote sensing tasks has produced acceptable results and reduced the need for labeled samples. However, the domain differences between ground imageries and remote sensing images cause the performance of such transfer learning to be limited. Recent research has demonstrated that self-supervised learning methods capture visual features that are more discriminative and transferable than the supervised ImageNet weights. We are motivated by these facts to pre-train the in-domain representations of remote sensing imagery using contrastive self-supervised learning and transfer the learned features to other related remote sensing datasets. Specifically, we used the SimSiam algorithm to pre-train the in-domain knowledge of remote sensing datasets and then transferred the obtained weights to the other scene classification datasets. Thus, we have obtained state-of-the-art results on five land cover classification datasets with varying numbers of classes and spatial resolutions. In addition, By conducting appropriate experiments, including feature pre-training using datasets with different attributes, we have identified the most influential factors that make a dataset a good choice for obtaining in-domain features. We have transferred the features obtained by pre-training SimSiam on remote sensing datasets to various downstream tasks and used them as initial weights for fine-tuning. Moreover, we have linearly evaluated the obtained representations in cases where the number of samples per class is limited. Our experiments have demonstrated that using a higher-resolution dataset during the self-supervised pre-training stage results in learning more discriminative and general representations.

翻译：将ImageNet预训练权重迁移至各类遥感任务虽已取得可接受的成果并减少了对标注样本的需求，但地面图像与遥感图像之间的领域差异导致此类迁移学习的性能受限。最新研究表明，自监督学习方法捕获的视觉特征比有监督的ImageNet权重更具判别性与可迁移性。受此启发，我们采用对比自监督学习对遥感影像的领域内表示进行预训练，并将学习到的特征迁移至其他相关遥感数据集。具体而言，我们利用SimSiam算法预训练遥感数据集的领域内知识，随后将所得权重迁移至其他场景分类数据集。由此，我们在五个类别数量与空间分辨率各异的土地覆盖分类数据集上取得了最优结果。此外，通过设计包括使用不同属性数据集进行特征预训练在内的对比实验，我们确定了使数据集成为领域内特征优质来源的关键影响因素。我们将基于遥感数据集预训练的SimSiam特征迁移至各类下游任务，并将其作为微调的初始权重。同时，在每类样本数量受限的情况下，我们对所得表示进行了线性评估。实验证明，自监督预训练阶段使用更高分辨率的数据集，能够学习到更具判别性与泛化性的特征表示。