GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation

Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS

翻译：自监督对比学习（SSCL）已在遥感图像理解领域取得重要进展，其核心在于设计一种无监督的实例判别预训练任务，从大量无标签图像中提取对下游任务有益的特征。然而，现有基于实例判别的SSCL在应用于遥感图像语义分割任务时存在两个局限：1）正样本混淆问题；2）特征适应偏差。当应用于需要像素级或对象级特征的语义分割任务时，会引入特征适应偏差。本研究发现，通过无监督对比损失的梯度，可将判别信息映射到遥感图像中的特定区域，这些区域往往包含奇异地物。基于此，本文提出面向遥感图像语义分割的梯度引导采样策略对比学习（GraSS）。GraSS包含两个阶段：实例判别预热（ID预热）和梯度引导采样对比训练（GS训练）。ID预热阶段旨在为对比损失梯度提供初始判别信息；GS训练阶段则利用对比损失梯度中的判别信息，自适应选取遥感图像块中包含更多奇异地物的区域，以构建新的正负样本。在三个公开数据集上的实验结果表明，GraSS能有效提升SSCL在高分辨率遥感图像语义分割中的性能。与五种不同类型SSCL的七种基准方法相比，GraSS在平均交并比上实现平均提升1.57%、最高提升3.58%。源代码已公开于https://github.com/GeoX-Lab/GraSS