JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming. Despite attempts to develop finetuning-free methods, their generation quality is much lower compared to their finetuning counterparts. In this paper, we propose Joint-Image Diffusion (\jedi), an effective technique for learning a finetuning-free personalization model. Our key idea is to learn the joint distribution of multiple related text-image pairs that share a common subject. To facilitate learning, we propose a scalable synthetic dataset generation technique. Once trained, our model enables fast and easy personalization at test time by simply using reference images as input during the sampling process. Our approach does not require any expensive optimization process or additional modules and can faithfully preserve the identity represented by any number of reference images. Experimental results show that our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.

翻译：个性化文本到图像生成模型使用户能够创建描绘其个人物品在不同场景中的图像，在多个领域具有应用价值。为实现个性化能力，现有方法依赖于在用户自定义数据集上对文本到图像基础模型进行微调，这对普通用户而言可能具有挑战性、资源密集且耗时。尽管已有研究尝试开发免微调方法，但其生成质量远低于基于微调的方法。本文提出联合图像扩散（JeDi），一种用于学习免微调个性化模型的有效技术。我们的核心思想是学习共享同一主体的多个相关文本-图像对的联合分布。为促进学习，我们提出一种可扩展的合成数据集生成技术。训练完成后，我们的模型通过在采样过程中简单使用参考图像作为输入，即可在测试时实现快速便捷的个性化。该方法无需任何昂贵的优化过程或附加模块，并能忠实保留任意数量参考图像所表征的主体身份。实验结果表明，我们的模型在定量和定性评估中均达到最先进的生成质量，显著优于先前基于微调和免微调的个性化基线方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日