Recent progress in generative models, especially in text-guided diffusion models, has enabled the production of aesthetically-pleasing imagery resembling the works of professional human artists. However, one has to carefully compose the textual description, called the prompt, and augment it with a set of clarifying keywords. Since aesthetics are challenging to evaluate computationally, human feedback is needed to determine the optimal prompt formulation and keyword combination. In this paper, we present a human-in-the-loop approach to learning the most useful combination of prompt keywords using a genetic algorithm. We also show how such an approach can improve the aesthetic appeal of images depicting the same descriptions.
翻译:近期生成模型,特别是文本引导扩散模型的进展,使得生成类似专业人类艺术家作品的美学图像成为可能。然而,用户需要精心构建文本描述(称为提示),并用一组澄清性关键词对其进行增强。由于美学效果难以通过计算评估,因此需要人工反馈来确定最优提示表述和关键词组合。在本文中,我们提出了一种基于人类参与的遗传算法方法,用于学习最有效的提示关键词组合。我们还展示了该方法如何提升相同描述的图像的美学吸引力。