Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectively
翻译:当前深度网络对数据需求极高,且依赖大规模数据集的训练,而这类数据集通常收集与标注耗时费力。相比之下,使用DALL-E和扩散模型等生成模型可以极低成本轻松生成无限量的合成数据。本文提出DatasetDM——一种通用数据集生成模型,能够产生多样化的合成图像及对应的高质量感知标注(如分割掩膜、深度图)。该方法基于预训练扩散模型,将文本引导的图像合成扩展至感知数据生成。研究表明,通过解码器模块可有效将扩散模型丰富的隐编码解码为精确的感知标注。训练该解码器仅需不足1%(约100张)人工标注图像,即可生成无限量带标注数据集。这些合成数据可用于训练下游任务中的各类感知模型。为展示该方法的能力,我们为语义分割、实例分割和深度估计等广泛下游任务生成了具备密集像素级标注的数据集。值得注意的是,该方法实现了:1)语义分割与实例分割的最优结果;2)域泛化性能显著优于仅使用真实数据;在零样本分割设定中也达到最优结果;3)高效应用与新型任务组合(如图像编辑)的灵活性。项目网站与代码分别参见https://weijiawu.github.io/DatasetDM_page/ 和 https://github.com/showlab/DatasetDM。