While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.
翻译:尽管基于扩散的文本到图像(T2I)模型提供了一种简单而强大的图像生成方式,但引导这一生成过程仍面临挑战。对于难以通过语言描述的概念,用户可能难以构建提示词。此外,许多这类模型被构建为端到端系统,缺乏对图像迭代塑造的支持。为此,我们提出PromptPaint,它将T2I生成与模拟我们使用彩色颜料的方式进行交互相结合。PromptPaint允许用户超越语言限制,通过混合提示词来表达复杂概念。正如我们通过在实体画布上分层放置颜料来迭代调整颜色一样,PromptPaint同样允许用户在生成过程的不同画布区域和时间点应用不同的提示词。通过一系列研究,我们刻画了混合提示词的不同方法、设计权衡以及生成式模型面临的社会技术挑战。借助PromptPaint,我们为未来可引导的生成工具提供了深刻见解。