Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address the challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
翻译:文本到图像生成模型在根据文本提示生成高质量图像方面展现了卓越能力。然而,如何构建准确捕捉用户创作意图的提示仍具挑战性。这一过程通常涉及繁琐的试错步骤,以确保模型对提示的理解与用户意图一致。为解决这些问题,我们提出了Promptify——一个支持文本到图像生成模型中提示探索与优化的交互式系统。Promptify利用大型语言模型驱动的建议引擎,帮助用户快速探索并构建多样化的提示。我们的界面允许用户灵活组织生成的图像,并根据其偏好,Promptify会建议对原始提示进行潜在修改。这一反馈循环使用户能够逐步优化提示,增强期望特征同时避免不必要的内容。用户研究表明,Promptify有效促进了文本到图像生成工作流,并在性能上优于现有广泛使用的基线工具。