Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics.
翻译:精心设计的提示词已展现出引导文本生成图像模型生成惊人图像的潜力。尽管现有提示工程方法能提供高层级指导,但新手用户手动输入提示词时,由于用户输入与模型偏好提示词之间存在分布差异,很难获得理想结果。为弥合用户输入行为与模型训练数据集之间的分布差距,我们首先构建了全新的粗细粒度提示词数据集(CFP),并提出用户友好的细粒度文本生成框架(UF-FGTG)用于自动化提示词优化。CFP数据集专为文本到图像任务构建,通过结合粗粒度与细粒度提示词,推动自动化提示生成方法的发展。UF-FGTG框架能自动将用户输入提示词转化为模型偏好提示词。具体而言,我们提出提示词优化器持续改写提示词,使用户能选择符合个性化需求的结果;同时将文本到图像模型中的图像相关损失函数集成到文本生成训练过程中,以生成模型偏好提示词。此外,我们提出自适应特征提取模块保证生成结果的多样性。实验表明,相较现有最优方法,本方法能生成视觉更美观、更具多样性的图像,在六项质量和美学指标上平均提升5%。