Text-to-image generative models are a new and powerful way to generate visual artwork. However, the open-ended nature of text as interaction is double-edged; while users can input anything and have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt keywords and model hyperparameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style keywords and investigate success and failure modes of these prompts. Our evaluation of 5493 generations over the course of five experiments spans 51 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people produce better outcomes from text-to-image generative models.
翻译:文本到图像生成模型是一种生成视觉艺术作品的新型强大工具。然而,文本作为交互媒介的开放性质具有双刃剑效应:用户虽然可以输入任何内容并获取无限的生成结果,但当结果质量不佳时,也必须通过暴力试错的方式调整文本提示。我们开展了一项研究,探索哪些提示关键词和模型超参数有助于生成连贯的输出。具体而言,我们研究了包含主题和风格关键词的结构化提示,并调查了这些提示的成功与失败模式。我们对五次实验中的5493个生成结果进行了评估,涵盖51个抽象与具体主题以及51种抽象与具象风格。基于这项评估,我们提出了设计指南,以帮助用户从文本到图像生成模型中获得更优结果。