DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning

Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts. However, these models often struggle with novel concepts, eg, new styles, object entities, etc. Although recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set,they have the drawback of overfitting to the given reference images, particularly in one-shot applications, which is harmful to generate diverse and high-quality images while maintaining generation controllability. To tackle this challenge, we present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy. Specifically, DreamArtist incorporates both positive and negative embeddings and jointly trains them. The positive embedding aggressively captures the salient characteristics of the reference image to drive diversified generation and the negative embedding rectifies inadequacies from the positive embedding. It learns not only what is correct, but also what can be avoided or improved. We have conducted extensive experiments and evaluated the proposed method from image similarity and diversity, generation controllability, and style cloning. And our DreamArtist has achieved a superior generation performance over existing methods. Besides, our additional evaluation on extended tasks, including concept compositions and prompt-guided image editing, demonstrates its effectiveness for more applications.

翻译：大规模文本到图像生成模型在根据文本引导合成高质量、特征丰富且高分辨率的图像方面取得了显著进展。然而，这些模型在处理新概念（例如新风格、物体实体等）时往往存在困难。尽管近期尝试采用微调或提示调优策略，使预训练扩散模型从参考图像集中学习新概念，但这些方法存在对给定参考图像过拟合的问题，尤其是在单样本应用中，这会损害生成多样化、高质量图像的能力，同时难以保持生成可控性。为应对这一挑战，我们提出了一种简单而有效的方法——DreamArtist，该方法采用正负提示调优学习策略。具体而言，DreamArtist 同时引入正向和负向嵌入并进行联合训练。正向嵌入主动捕捉参考图像的显著特征以驱动多样化生成，而负向嵌入则修正正向嵌入的不足。它不仅能学习正确的内容，还能学习应避免或改进之处。我们进行了大量实验，从图像相似性与多样性、生成可控性以及风格克隆方面评估了所提方法。我们的 DreamArtist 在生成性能上优于现有方法。此外，我们在扩展任务（包括概念组合和提示引导的图像编辑）上的额外评估，证明了其在更多应用中的有效性。