As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (C$^3$P) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM.
翻译:作为水下智能的重要支柱,海洋动物分割(MAS)旨在对海洋环境中的动物进行分割。现有方法在提取长程上下文特征方面表现欠佳,且忽视了离散像素间的连通性。近期,分割一切模型(SAM)为通用分割任务提供了统一框架。然而,由于使用自然图像训练,SAM未能获取海洋图像的先验知识。此外,SAM的单位置提示对先验引导而言严重不足。为解决这些问题,我们提出了一种名为Dual-SAM的新型特征学习框架,用于高性能MAS。为此,我们首先引入基于SAM范式的双重结构以增强海洋图像的特征学习;其次提出多层级耦合提示(MCP)策略,用以指导全面的水下先验信息,并通过适配器增强SAM编码器的多层级特征;随后设计扩张融合注意力模块(DFAM),逐步整合SAM编码器的多层级特征;最后,我们提出十字交叉连通性预测(C$^3$P)范式替代直接预测海洋动物掩膜,以捕捉离散像素间的互连关系。通过双重解码器,该方法生成伪标签并对互补特征表征进行相互监督,相较以往技术取得显著提升。大量实验证明,本方法在五个广泛使用的MAS数据集上达到最先进性能。代码开源于https://github.com/Drchip61/Dual_SAM。