Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.
翻译:自2021年以来,文本到图像生成领域引起了广泛关注。如今,借助深度生成模型,人们可以从文本输入(“提示”)中合成出美丽而引人入胜的数字图像和艺术作品。围绕文本到图像生成与AI生成艺术的在线社区迅速涌现。本文基于一项为期三个月的人种志研究,识别出在线社区从业者使用的六种提示修饰语。这一新颖的提示修饰语分类法为研究者探索文本到图像生成的实践提供了概念起点,同时也可能帮助AI生成艺术的从业者改进其图像作品。我们进一步概述了提示修饰语如何在“提示工程”实践中应用,并讨论了这一新兴创意实践在人机交互(HCI)领域的研究机遇。本文最后从人机交互(HAI)视角出发,探讨了提示工程在文本到图像生成与AI生成艺术之外未来应用中的更广泛影响。