In creativity support and computational co-creativity contexts, the task of discovering appropriate prompts for use with text-to-image generative models remains difficult. In many cases the creator wishes to evoke a certain impression with the image, but the task of conferring that succinctly in a text prompt poses a challenge: affective language is nuanced, complex, and model-specific. In this work we introduce a method for generating images conditioned on desired affect, quantified using a psychometrically validated three-component approach, that can be combined with conditioning on text descriptions. We first train a neural network for estimating the affect content of text and images from semantic embeddings, and then demonstrate how this can be used to exert control over a variety of generative models. We show examples of how affect modifies the outputs, provide quantitative and qualitative analysis of its capabilities, and discuss possible extensions and use cases.
翻译:在创造性支持与计算协同创意语境中,为文本到图像生成模型发现合适的提示词仍然是一个困难的任务。在许多情况下,创作者希望图像能够唤起特定的情感印象,但如何用文本提示简洁地传达这一目标具有挑战性:情感语言具有细微差别、复杂性且依赖于具体模型。本研究提出了一种基于期望情感条件生成图像的方法,该方法采用经过心理测量学验证的三成分模型对情感进行量化,并可结合文本描述条件进行控制。我们首先训练了一个神经网络,通过语义嵌入估计文本和图像的情感内容,随后展示了如何利用该网络对多种生成模型施加控制。通过示例展示情感对输出结果的调整效果,提供定量与定性分析评估其能力,并探讨可能的扩展与应用场景。