Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.
翻译:近期,分割一切模型(Segment Anything Model, SAM)在生成高质量目标掩码及实现零样本图像分割方面表现出卓越性能。然而,作为通用视觉模型,SAM主要基于大规模自然光照图像进行训练。在水下场景中,由于光的散射和吸收,其性能显著下降。同时,SAM解码器的简化结构可能导致精细目标细节的丢失。针对上述问题,我们提出一种名为MAS-SAM的新型特征学习框架,通过将高效适配器集成到SAM编码器中并构建金字塔解码器,实现海洋动物分割。具体而言,我们首先构建具有水下场景适配能力的新SAM编码器,随后引入超图提取模块(Hypermap Extraction Module, HEM)生成多尺度特征以提供全面引导,最后提出渐进预测解码器(Progressive Prediction Decoder, PPD)聚合多尺度特征并预测最终分割结果。结合融合注意力模块(Fusion Attention Module, FAM),本方法能够从全局上下文线索到精细局部细节中提取更丰富的海洋信息。在四个公开MAS数据集上的大量实验表明,我们的MAS-SAM可获得优于其他典型分割方法的结果。源代码已开源至https://github.com/Drchip61/MAS-SAM。