Remote sensing image plays an irreplaceable role in fields such as agriculture, water resources, military, and disaster relief. Pixel-level interpretation is a critical aspect of remote sensing image applications; however, a prevalent limitation remains the need for extensive manual annotation. For this, we try to introduce open-vocabulary semantic segmentation (OVSS) into the remote sensing context. However, due to the sensitivity of remote sensing images to low-resolution features, distorted target shapes and ill-fitting boundaries are exhibited in the prediction mask. To tackle this issue, we propose a simple and general upsampler, SimFeatUp, to restore lost spatial information in deep features in a training-free style. Further, based on the observation of the abnormal response of local patch tokens to [CLS] token in CLIP, we propose to execute a straightforward subtraction operation to alleviate the global bias in patch tokens. Extensive experiments are conducted on 17 remote sensing datasets spanning semantic segmentation, building extraction, road detection, and flood detection tasks. Our method achieves an average of 5.8%, 8.2%, 4.0%, and 15.3% improvement over state-of-the-art methods on 4 tasks. All codes are released. \url{https://earth-insights.github.io/SegEarth-OV}
翻译:遥感图像在农业、水资源、军事及灾害救援等领域发挥着不可替代的作用。像素级解译是遥感图像应用的关键环节,然而,当前普遍存在的局限在于需要大量人工标注。为此,我们尝试将开放词汇语义分割引入遥感场景。然而,由于遥感图像对低分辨率特征敏感,预测掩码中常出现目标形状扭曲与边界拟合不佳的问题。为解决此问题,我们提出一种简单通用的上采样器SimFeatUp,以免训练方式恢复深层特征中丢失的空间信息。进一步地,基于对CLIP中局部图像块标记对[CLS]标记异常响应的观察,我们提出执行简单的减法操作以缓解图像块标记中的全局偏差。我们在涵盖语义分割、建筑物提取、道路检测及洪水检测任务的17个遥感数据集上进行了广泛实验。我们的方法在4项任务上相较最先进方法平均提升了5.8%、8.2%、4.0%和15.3%。所有代码均已开源。\url{https://earth-insights.github.io/SegEarth-OV}