Segment anything model (SAM) has demonstrated excellent generalization capabilities in common vision scenarios, yet lacking an understanding of specialized data. Although numerous works have focused on optimizing SAM for downstream tasks, these task-specific approaches usually limit the generalizability to other downstream tasks. In this paper, we aim to investigate the impact of the general vision modules on finetuning SAM and enable them to generalize across all downstream tasks. We propose a simple unified framework called SimAda for adapting SAM in underperformed scenes. Specifically, our framework abstracts the general modules of different methods into basic design elements, and we design four variants based on a shared theoretical framework. SimAda is simple yet effective, which removes all dataset-specific designs and focuses solely on general optimization, ensuring that SimAda can be applied to all SAM-based and even Transformer-based models. We conduct extensive experiments on nine datasets of six downstream tasks. The results demonstrate that SimAda significantly improves the performance of SAM on multiple downstream tasks and achieves state-of-the-art performance on most of them, without requiring task-specific designs. Code is available at: https://github.com/zongzi13545329/SimAda
翻译:通用分割模型(SAM)在常见视觉场景中展现出卓越的泛化能力,但对专业数据缺乏理解。虽然已有大量研究聚焦于优化SAM以适配下游任务,但这些任务特定方法通常限制了其向其他下游任务的泛化能力。本文旨在探究通用视觉模块对SAM微调的影响,并使其能够跨所有下游任务泛化。我们提出一个名为SimAda的简易统一框架,用于在性能不足场景中适配SAM。具体而言,该框架将不同方法的通用模块抽象为基本设计元素,并基于共享理论框架设计了四种变体。SimAda简洁而高效,摒弃了所有数据集特定设计,仅聚焦于通用优化,确保可应用于所有基于SAM乃至基于Transformer的模型。我们在六个下游任务的九个数据集上进行了广泛实验。结果表明,SimAda显著提升了SAM在多个下游任务中的性能,并在大多数任务上达到最优水平,且无需任务特定设计。代码开源地址:https://github.com/zongzi13545329/SimAda