Text-conditioned image generation models are a prevalent use of AI image synthesis, yet intuitively controlling output guided by an artist remains challenging. Current methods require multiple images and textual prompts for each object to specify them as concepts to generate a single customized image. On the other hand, our work, \verb|DiffMorph|, introduces a novel approach that synthesizes images that mix concepts without the use of textual prompts. Our work integrates a sketch-to-image module to incorporate user sketches as input. \verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image. We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully. We seamlessly merge images and concepts from sketches into a cohesive composition. The image generation capability of our work is demonstrated through our results and a comparison of these with prompt-based image generation.
翻译:文本条件图像生成模型是AI图像合成中的常见应用,但直觉性地控制艺术家引导的输出仍具挑战性。现有方法需为每个对象提供多张图像和文本提示,以将其指定为概念来生成单一定制图像。另一方面,我们的工作DiffMorph提出了一种无需文本提示即可混合概念生成图像的新方法。我们的工作集成了草图到图像模块,可将用户草图作为输入。DiffMorph以初始图像和艺术家绘制的条件草图为输入,生成变形图像。我们采用预训练的文本到图像扩散模型,并对其进行微调以忠实重建每张图像。我们将图像和草图概念无缝融合,形成连贯的构成。通过结果及与基于提示的图像生成方法的比较,证明了我们工作的图像生成能力。