SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. The SegCLIP achieves segmentation based on ViT and the main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. The gathering operation can dynamically capture the semantic groups, which can be used to generate the final segmentation results. We further propose a reconstruction loss on masked patches and a superpixel-based KL loss with pseudo-labels to enhance the visual representation. Experimental results show that our model achieves comparable or superior segmentation accuracy on the PASCAL VOC 2012 (+0.3% mIoU), PASCAL Context (+2.3% mIoU), and COCO (+2.2% mIoU) compared with baselines. We release the code at https://github.com/ArrowLuo/SegCLIP.

翻译：近年来，对比语言-图像预训练（如CLIP）在下游任务中展现出显著成效。通过大规模图文数据的学习，预训练模型能够捕获丰富的视觉概念。然而，将所学视觉知识迁移至开放词汇语义分割领域仍鲜有探索。本文提出一种名为SegCLIP的CLIP衍生模型，以无标注方式实现开放词汇分割任务。该模型基于ViT架构，核心理念在于通过图文对训练，利用可学习中心将图像补丁聚合成语义区域。这种聚合操作可动态捕获语义分组，进而生成最终分割结果。我们进一步提出掩码补丁重建损失和基于超像素的伪标签KL损失来增强视觉表征。实验结果表明，相较于基线模型，本方法在PASCAL VOC 2012（mIoU提升0.3%）、PASCAL Context（mIoU提升2.3%）和COCO（mIoU提升2.2%）数据集上达到相当或更优的分割精度。代码已开源至https://github.com/ArrowLuo/SegCLIP。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/