Text-to-image generation model is able to generate images across a diverse range of subjects and styles based on a single prompt. Recent works have proposed a variety of interaction methods that help users understand the capabilities of models and utilize them. However, how to support users to efficiently explore the model's capability and to create effective prompts are still open-ended research questions. In this paper, we present PromptCrafter, a novel mixed-initiative system that allows step-by-step crafting of text-to-image prompt. Through the iterative process, users can efficiently explore the model's capability, and clarify their intent. PromptCrafter also supports users to refine prompts by answering various responses to clarifying questions generated by a Large Language Model. Lastly, users can revert to a desired step by reviewing the work history. In this workshop paper, we discuss the design process of PromptCrafter and our plans for follow-up studies.
翻译:文本生成图像模型能够基于单一提示词生成涵盖多种主题和风格的图像。近期研究提出了多种交互方法,帮助用户理解模型的能力并加以利用。然而,如何支持用户高效探索模型能力并创建有效的提示词,仍然是开放性的研究问题。本文提出了PromptCrafter,一种新颖的混合主动系统,允许逐步构建文本到图像的提示词。通过迭代过程,用户能够高效探索模型能力并明确自身意图。PromptCrafter还支持用户通过回答由大语言模型生成的多种澄清问题来优化提示词。最后,用户可以通过查看工作历史回退到任意理想步骤。在本研讨会论文中,我们讨论了PromptCrafter的设计过程及后续研究计划。