In the creative practice of text-to-image (TTI) generation, images are synthesized from textual prompts. By design, TTI models always yield an output, even if the prompt contains unknown terms. In this case, the model may generate default images: images that closely resemble each other across many unrelated prompts. Studying default images is valuable for designing better solutions for prompt engineering and TTI generation. We present the first investigation into default images on Midjourney. We describe an initial study in which we manually created input prompts triggering default images, and several ablation studies. Building on these, we conduct a computational analysis of over 750,000 images, revealing consistent default images across unrelated prompts. We also conduct an online user study investigating how default images may affect user satisfaction. Our work lays the foundation for understanding default images in TTI generation, highlighting their practical relevance as well as challenges and future research directions.
翻译:在文本到图像(TTI)生成的创作实践中,图像由文本提示词合成生成。TTI模型在设计上总会产生输出,即使提示词包含未知术语。在这种情况下,模型可能生成默认图像:即许多不相关提示词下产生的彼此高度相似的图像。研究默认图像对于设计更好的提示工程与TTI生成解决方案具有重要价值。本文首次针对Midjourney平台上的默认图像展开研究。我们首先通过手动构建触发默认图像的输入提示词进行初步探索,并开展了多项消融实验。在此基础上,我们对超过75万张图像进行了计算分析,揭示了不相关提示词下存在的稳定默认图像模式。同时,我们通过在线用户研究探讨了默认图像可能对用户满意度产生的影响。本研究为理解TTI生成中的默认图像奠定了基础,阐明了其实际意义、现存挑战及未来研究方向。