GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation

Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS

翻译：自监督对比学习（SSCL）在遥感图像（RSI）理解领域取得了显著进展。其核心在于设计无监督的实例判别前置任务，从大量未标注图像中提取有利于下游任务的图像特征。然而，现有基于实例判别的方法在应用于RSI语义分割任务时存在两个局限性：1）正样本混淆问题；2）特征适应偏差。当应用于需要像素级或目标级特征的语义分割任务时，会引入特征适应偏差。本研究发现，通过无监督对比损失的梯度，判别信息可映射至RSI中的特定区域，这些区域往往包含奇异地物。基于此，我们提出用于RSI语义分割的梯度引导采样策略对比学习（GraSS）。GraSS包含两个阶段：实例判别预热（ID预热）和梯度引导采样对比训练（GS训练）。ID预热阶段旨在为对比损失梯度提供初始判别信息，GS训练阶段则利用对比损失梯度中的判别信息，自适应选取RSI图像块中包含更多奇异地物的区域以构建新的正负样本。在三个公开数据集上的实验结果表明，GraSS能有效提升SSCL在高分辨率RSI语义分割中的性能。与五种不同SSCL类型的七种基线方法相比，GraSS的平均交并比提升了1.57%，最高提升了3.58%。源代码见https://github.com/GeoX-Lab/GraSS