Multi-Concept Customization of Text-to-Image Diffusion

from arxiv, Updated v2 with results on the new CustomConcept101 dataset https://www.cs.cmu.edu/~custom-diffusion/dataset.html Project webpage: https://www.cs.cmu.edu/~custom-diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

翻译：尽管生成模型能够从大规模数据集中学习并生成高质量的概念图像，但用户通常希望合成自己专属概念（例如家人、宠物或物品）的具体实例。我们能否仅通过少量示例，让模型快速习得一个新概念？更进一步，能否将多个新概念组合在一起？为此，我们提出Custom Diffusion——一种用于增强现有文本到图像模型的高效方法。研究发现，仅优化文本到图像条件机制中的少量参数，便足以表征新概念，同时实现快速调优（约6分钟）。此外，我们能够联合训练多个概念，或通过闭式约束优化将多个微调模型合并为一个。经微调的模型可生成多个新概念的不同变体，并将其与现有概念无缝融合于新场景中。在定性和定量评估中，我们的方法在保持内存与计算效率的同时，性能优于或持平于多个基线模型及同期工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/