For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.
翻译:在基于扩散模型(DMs)的图像生成任务中,负向提示词 n 可用于补充文本提示词 p,以帮助定义合成图像中不希望出现的属性。尽管这提高了提示遵循度和图像质量,但寻找有效的负向提示词具有挑战性。我们认为,这是由于人类与扩散模型之间存在语义鸿沟,导致对扩散模型有效的负向提示词对人类而言显得不直观。为弥合这一鸿沟,我们提出了一种新的扩散负向提示(DNP)策略。DNP基于一种新流程,用于在扩散模型的分布下采样最不遵循 p 的图像,该流程称为扩散负采样(DNS)。给定 p 后,采样得到一幅此类图像,随后由用户或图像描述模型将其转化为自然语言,以生成负向提示词 n*。最终使用提示词对 (p, n*) 来引导扩散模型。DNS实现简单且无需训练。实验与人工评估表明,DNP在定量和定性方面均表现良好,并能轻松与多种扩散模型变体结合使用。