In the realm of artificial intelligence, the emergence of foundation models, backed by high computing capabilities and extensive data, has been revolutionary. Segment Anything Model (SAM), built on the Vision Transformer (ViT) model with millions of parameters and vast training dataset SA-1B, excels in various segmentation scenarios relying on its significance of semantic information and generalization ability. Such achievement of visual foundation model stimulates continuous researches on specific downstream tasks in computer vision. The ClassWise-SAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images. The proposed CWSAM freezes most of SAM's parameters and incorporates lightweight adapters for parameter efficient fine-tuning, and a classwise mask decoder is designed to achieve semantic segmentation task. This adapt-tuning method allows for efficient landcover classification of SAR images, balancing the accuracy with computational demand. In addition, the task specific input module injects low frequency information of SAR images by MLP-based layers to improve the model performance. Compared to conventional state-of-the-art semantic segmentation algorithms by extensive experiments, CWSAM showcases enhanced performance with fewer computing resources, highlighting the potential of leveraging foundational models like SAM for specific downstream tasks in the SAR domain. The source code is available at: https://github.com/xypu98/CWSAM.
翻译:在人工智能领域,凭借高计算能力与海量数据支持的基础模型的出现具有革命性意义。基于Vision Transformer(ViT)模型构建、拥有数百万参数及大规模训练数据集SA-1B的Segment Anything Model(SAM),凭借其语义信息的重要性和泛化能力,在各类分割场景中表现出色。视觉基础模型的这一成就推动了计算机视觉领域特定下游任务的持续研究。本文提出ClassWise-SAM-Adapter(CWSAM),旨在将高性能SAM适配于星载合成孔径雷达(SAR)图像的土地覆盖分类任务。所提出的CWSAM冻结SAM的大部分参数,引入轻量级适配器实现参数高效微调,并设计类别级掩码解码器以完成语义分割任务。该适配微调方法可在平衡精度与计算需求的前提下,高效实现SAR图像的土地覆盖分类。此外,任务特定输入模块通过基于MLP的层注入SAR图像的低频信息,进一步提升模型性能。通过大量实验与现行最先进语义分割算法的对比表明,CWSAM以更少的计算资源展现了更优的性能,凸显了利用SAM等基础模型服务于SAR领域特定下游任务的潜力。源代码地址:https://github.com/xypu98/CWSAM。