Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding

The rapid evolution of text-to-image diffusion models has opened the door of generative AI, enabling the translation of textual descriptions into visually compelling images with remarkable quality. However, a persistent challenge within this domain is the optimization of prompts to effectively convey abstract concepts into concrete objects. For example, text encoders can hardly express "peace", while can easily illustrate olive branches and white doves. This paper introduces a novel approach named Prompt Optimizer for Abstract Concepts (POAC) specifically designed to enhance the performance of text-to-image diffusion models in interpreting and generating images from abstract concepts. We propose a Prompt Language Model (PLM), which is initialized from a pre-trained language model, and then fine-tuned with a curated dataset of abstract concept prompts. The dataset is created with GPT-4 to extend the abstract concept to a scene and concrete objects. Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts. Through extensive experiments, we demonstrate that our proposed POAC significantly improves the accuracy and aesthetic quality of generated images, particularly in the description of abstract concepts and alignment with optimized prompts. We also present a comprehensive analysis of our model's performance across diffusion models under different settings, showcasing its versatility and effectiveness in enhancing abstract concept representation.

翻译：文本到图像扩散模型的快速演进开启了生成式AI的大门，使得将文本描述转化为视觉上引人入胜的图像成为可能，且生成质量显著提升。然而，该领域持续面临的挑战是如何优化提示词，以有效将抽象概念转化为具体对象。例如，文本编码器难以表达"和平"这一概念，却能轻松呈现橄榄枝与白鸽等具体意象。本文提出一种名为"抽象概念提示优化器"（POAC）的创新方法，专门用于提升文本到图像扩散模型在解读抽象概念并生成相应图像方面的性能。我们构建了一个提示语言模型（PLM），该模型基于预训练语言模型进行初始化，并通过精心构建的抽象概念提示数据集进行微调。该数据集利用GPT-4将抽象概念扩展为场景与具体对象。我们的框架采用基于强化学习（RL）的优化策略，聚焦于稳定扩散模型生成图像与优化后提示词之间的一致性。通过大量实验证明，所提出的POAC方法能显著提升生成图像的准确性与美学质量，尤其在抽象概念描述及与优化提示词的对齐方面表现优异。我们还在不同设置下对扩散模型的性能进行了全面分析，充分展示了其在增强抽象概念表征方面的通用性与有效性。