Recently, foundation models trained on massive datasets to adapt to a wide range of domains have attracted considerable attention and are actively being explored within the computer vision community. Among these, the Segment Anything Model (SAM) stands out for its remarkable progress in generalizability and flexibility for image segmentation tasks, achieved through prompt-based object mask generation. However, despite its strength, SAM faces two key limitations when applied to customized instance segmentation that segments specific objects or those in unique environments not typically present in the training data: 1) the ambiguity inherent in input prompts and 2) the necessity for extensive additional training to achieve optimal segmentation. To address these challenges, we propose a novel method, customized instance segmentation via prompt learning tailored to SAM. Our method involves a prompt learning module (PLM), which adjusts input prompts into the embedding space to better align with user intentions, thereby enabling more efficient training. Furthermore, we introduce a point matching module (PMM) to enhance the feature representation for finer segmentation by ensuring detailed alignment with ground truth boundaries. Experimental results on various customized instance segmentation scenarios demonstrate the effectiveness of the proposed method.
翻译:近期,在大量数据集上训练以适应广泛领域的基石模型引起了计算机视觉领域的广泛关注,并正在积极研究中。其中,Segment Anything Model(SAM)因在图像分割任务中通过基于提示的对象掩码生成而实现的显著泛化性和灵活性表现突出。然而,尽管SAM具有强大性能,但在应用于定制实例分割(即分割训练数据中未出现的特定对象或独特环境中的对象)时,存在两个关键限制:1)输入提示固有的模糊性,以及2)需要大量额外训练才能达到最优分割效果。为解决这些问题,我们提出了一种新方法——通过针对SAM的提示学习实现定制实例分割。该方法包含一个提示学习模块(PLM),可将输入提示调整至嵌入空间以更好地对齐用户意图,从而支持更高效的训练。此外,我们引入了点匹配模块(PMM),通过确保与真实边界进行精细对齐来增强特征表示,实现更精细的分割。在多种定制实例分割场景上的实验结果验证了该方法的有效性。