Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (C$^3$P) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM.

翻译：作为水下智能的重要支柱，海洋动物分割（MAS）旨在对海洋环境中的动物进行分割。现有方法在提取长程上下文特征方面表现欠佳，且忽视了离散像素间的连通性。近期，分割一切模型（SAM）为通用分割任务提供了统一框架。然而，由于使用自然图像训练，SAM未能获取海洋图像的先验知识。此外，SAM的单位置提示对先验引导而言严重不足。为解决这些问题，我们提出了一种名为Dual-SAM的新型特征学习框架，用于高性能MAS。为此，我们首先引入基于SAM范式的双重结构以增强海洋图像的特征学习；其次提出多层级耦合提示（MCP）策略，用以指导全面的水下先验信息，并通过适配器增强SAM编码器的多层级特征；随后设计扩张融合注意力模块（DFAM），逐步整合SAM编码器的多层级特征；最后，我们提出十字交叉连通性预测（C$^3$P）范式替代直接预测海洋动物掩膜，以捕捉离散像素间的互连关系。通过双重解码器，该方法生成伪标签并对互补特征表征进行相互监督，相较以往技术取得显著提升。大量实验证明，本方法在五个广泛使用的MAS数据集上达到最先进性能。代码开源于https://github.com/Drchip61/Dual_SAM。