Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of "prior constraints". To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.
翻译:近年来,文本到图像的生成模型使我们能够将文字转化为生动迷人的图像。随之兴起的个性化技术让我们可以在新场景中构想独特的概念。然而,一个引人入胜的问题依然存在:我们如何生成一个从未见过的新想象概念?本文提出创意文本到图像生成任务,旨在生成某个广泛类别中的新成员(例如,生成一种与所有现存宠物不同的宠物)。我们利用尚未被充分研究的扩散先验模型,并证明创意生成问题可被形式化为对扩散先验输出空间的优化过程,由此产生一组“先验约束”。为防止生成的概念收敛至现存成员,我们引入一个问答式视觉语言模型(VLM),该模型自适应地为优化问题添加新约束,鼓励模型发现越来越独特的创作。最后,我们证明先验约束还可作为一种强大的混合机制,使我们能够在生成的概念间创建混合体,从而为创作过程带来更大灵活性。