Although the current different types of SAM adaptation methods have achieved promising performance for various downstream tasks, such as prompt-based ones and adapter-based ones, most of them belong to the one-step adaptation paradigm. In real-world scenarios, we are generally confronted with the dynamic scenario where the data comes in a streaming manner. Driven by the practical need, in this paper, we first propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains and carefully analyze the limitations of the existing SAM one-step adaptation methods in the continual segmentation scenario. Then we propose a novel simple-yet-effective Mixture of Domain Adapters (MoDA) algorithm which utilizes the Global Feature Tokens (GFT) and Global Assistant Tokens (GAT) modules to help the SAM encoder extract well-separated features for different task domains, and then provide the accurate task-specific information for continual learning. Extensive experiments demonstrate that our proposed MoDA obviously surpasses the existing classic continual learning methods, as well as prompt-based and adapter-based approaches for continual segmentation. Moreover, after sequential learning on the CoSAM benchmark with diverse data distributions, our MoDA maintains highly competitive results in the natural image domain, approaching the zero-shot performance of the original SAM, demonstrating its superior capability in knowledge preservation. Notably, the proposed MoDA can be seamlessly integrated into various one-step adaptation methods of SAM, which can consistently bring obvious performance gains. Code is available at \url{https://github.com/yangjl1215/CoSAM}
翻译:尽管当前不同类型的SAM适应方法(如基于提示的方法和基于适配器的方法)在各种下游任务中取得了良好性能,但多数属于单步适应范式。在实际场景中,我们通常面临数据以流式方式到达的动态环境。基于这一实际需求,本文首先构建了一个包含8个不同任务域的新型持续SAM适应(CoSAM)基准,并深入分析了现有SAM单步适应方法在持续分割场景中的局限性。随后,我们提出了一种新颖而简洁有效的混合域适配器(MoDA)算法,该算法利用全局特征令牌(GFT)和全局辅助令牌(GAT)模块,帮助SAM编码器为不同任务域提取充分分离的特征,从而为持续学习提供准确的任务特定信息。大量实验表明,我们提出的MoDA方法明显超越了现有的经典持续学习方法,以及在持续分割任务中基于提示和基于适配器的方法。此外,在具有多样化数据分布的CoSAM基准上进行顺序学习后,我们的MoDA在自然图像域保持了极具竞争力的结果,接近原始SAM的零样本性能,这证明了其在知识保留方面的卓越能力。值得注意的是,所提出的MoDA可以无缝集成到SAM的各种单步适应方法中,并能持续带来显著的性能提升。代码发布于 \url{https://github.com/yangjl1215/CoSAM}