DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectively

翻译：当前深度网络对数据需求极高，且依赖大规模数据集的训练，而这类数据集通常收集与标注耗时费力。相比之下，使用DALL-E和扩散模型等生成模型可以极低成本轻松生成无限量的合成数据。本文提出DatasetDM——一种通用数据集生成模型，能够产生多样化的合成图像及对应的高质量感知标注（如分割掩膜、深度图）。该方法基于预训练扩散模型，将文本引导的图像合成扩展至感知数据生成。研究表明，通过解码器模块可有效将扩散模型丰富的隐编码解码为精确的感知标注。训练该解码器仅需不足1%（约100张）人工标注图像，即可生成无限量带标注数据集。这些合成数据可用于训练下游任务中的各类感知模型。为展示该方法的能力，我们为语义分割、实例分割和深度估计等广泛下游任务生成了具备密集像素级标注的数据集。值得注意的是，该方法实现了：1）语义分割与实例分割的最优结果；2）域泛化性能显著优于仅使用真实数据；在零样本分割设定中也达到最优结果；3）高效应用与新型任务组合（如图像编辑）的灵活性。项目网站与代码分别参见https://weijiawu.github.io/DatasetDM_page/ 和 https://github.com/showlab/DatasetDM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日