Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.
翻译:去噪扩散概率模型(DDPMs)已被证明在大量数据训练时能够合成具有显著多样性的高质量图像。典型的扩散模型及现代大规模条件生成模型(如文生图生成模型)在极有限数据上进行微调时易出现过拟合现象。现有研究已探索利用包含少量图像的参考集进行主体驱动生成,但鲜有工作研究基于DDPM的领域驱动生成——该任务旨在学习目标领域的共同特征同时保持多样性。本文提出新颖的DomainStudio方法,利用有限数据将在大规模源数据集预训练的DDPMs适配至目标领域。该方法旨在保持源领域提供的主体多样性,并在目标领域获得高质量且多样化的适配样本。我们提出通过保持适配样本间的相对距离来实现可观的生成多样性,同时增强高频细节学习以提升生成质量。本方法兼容无条件和条件扩散模型。该工作首次尝试用扩散模型实现无约束小样本图像生成,其生成质量和多样性均优于当前基于GAN的最优方法。此外,本工作显著缓解了条件生成中的过拟合问题,实现了高质量领域驱动生成,进一步拓展了现代大规模文生图模型的应用场景。