Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales

The quality of text-to-image generation is continuously improving, yet the boundaries of its applicability are still unclear. In particular, refinement of the text input with the objective of achieving better results - commonly called prompt engineering - so far seems to have not been geared towards work with pre-existing texts. We investigate whether text-to-image generation and prompt engineering could be used to generate basic illustrations of popular fairytales. Using Midjourney v4, we engage in action research with a dual aim: to attempt to generate 5 believable illustrations for each of 5 popular fairytales, and to define a prompt engineering process that starts from a pre-existing text and arrives at an illustration of it. We arrive at a tentative 4-stage process: i) initial prompt, ii) composition adjustment, iii) style refinement, and iv) variation selection. We also discuss three reasons why the generation model struggles with certain illustrations: difficulties with counts, bias from stereotypical configurations and inability to depict overly fantastic situations. Our findings are not limited to the specific generation model and are intended to be generalisable to future ones.

翻译：文本到图像生成的质量持续提升，但其应用边界仍不明确。特别是，通过优化文本输入以获得更好结果（通常称为提示工程）的方法，目前似乎尚未针对现有文本的处理进行专门设计。本研究探讨文本到图像生成与提示工程能否用于为流行童话生成基础插图。我们使用Midjourney v4开展行动研究，目标有二：尝试为5个流行童话各生成5幅可信插图，并定义一套从现有文本出发生成插图的提示工程流程。最终得出包含四个阶段的初步流程：i) 初始提示，ii) 构图调整，iii) 风格细化，iv) 变体选择。同时讨论了生成模型在处理特定插图时遭遇困难的三个原因：计数困难、刻板配置导致的偏差，以及无法描绘过于奇幻的场景。本研究结论不局限于特定生成模型，旨在对未来的生成模型具有普适性。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日