Open-vocabulary Object Segmentation with Diffusion Models

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i.e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt. We make the following contributions: (i) we pair the existing Stable Diffusion model with a novel grounding module, that can be trained to align the visual and textual embedding space of the diffusion model with only a small number of object categories; (ii) we establish an automatic pipeline for constructing a dataset, that consists of {image, segmentation mask, text prompt} triplets, to train the proposed grounding module; (iii) we evaluate the performance of open-vocabulary grounding on images generated from the text-to-image diffusion model and show that the module can well segment the objects of categories beyond seen ones at training time; (iv) we adopt the augmented diffusion model to build a synthetic semantic segmentation dataset, and show that, training a standard segmentation model on such dataset demonstrates competitive performance on the zero-shot segmentation(ZS3) benchmark, which opens up new opportunities for adopting the powerful diffusion model for discriminative tasks.

翻译：本文旨在从预训练的文本到图像扩散模型中提取视觉-语言对应关系，以分割图的形式实现，即针对文本提示中描述的视觉实体，同时生成图像和对应的分割掩码。我们做出了以下贡献：(i) 将现有的Stable Diffusion模型与一个新颖的接地模块配对，该模块可通过少量目标类别进行训练，以对齐扩散模型的视觉和文本嵌入空间；(ii) 建立了一个自动数据集构建流程，生成由{图像，分割掩码，文本提示}三元组组成的数据集，用于训练所提出的接地模块；(iii) 评估了在文本到图像扩散模型生成的图像上进行开放词汇接地的性能，结果表明该模块能够很好地分割训练阶段未见类别的目标物体；(iv) 采用增强的扩散模型构建合成语义分割数据集，并表明在此类数据集上训练标准分割模型在零样本分割（ZS3）基准测试中展现出竞争力的性能，这为将强大的扩散模型应用于判别性任务开辟了新途径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/