RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does not yield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road mapping tasks . The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. In addition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets.

翻译：高分辨率遥感卫星的发展为遥感相关研究工作提供了极大便利。面对海量且复杂的遥感图像，特定目标的分割与提取是关键任务。近年来，Segment Anything Model（SAM）的提出为图像分割任务提供了通用的预训练模型。然而，将SAM直接应用于遥感图像分割任务并未取得令人满意的效果。为此，我们提出RSAM-Seg（Remote Sensing SAM with Semantic Segmentation），作为针对遥感领域的SAM定制化改进方案，无需人工干预即可提供提示。我们在SAM编码器部分的多头注意力块中引入了一组辅助缩放模块——Adapter-Scale，并在Vision Transformer（ViT）块之间插入Adapter-Feature模块。这些模块旨在融合高频图像信息与图像嵌入特征，生成图像驱动的提示。我们在四种不同的遥感场景上进行了实验，涵盖云检测、农田监测、建筑物检测和道路制图任务。实验结果不仅表明，RSAM-Seg在云、建筑物、农田和道路场景中相较于原始SAM和U-Net均有所提升，还凸显了其识别某些数据集真值中缺失区域的能力，证实了其作为辅助标注方法的潜力。此外，该模型在小样本场景下表现优异，彰显了其在处理有限数据集时的潜力。