EPIG: Emotion-Based Prompting for Personalised Image Generation

from arxiv, Submitted to arXiv. 20 pages, 4 figures. Work on emotion-based prompt engineering for text-to-image diffusion models with applications in personalized image generation

Text-to-image diffusion models have achieved impressive results in synthesizing high-quality images from natural language prompts. However, commonly used prompting strategies remain relatively generic, limiting the model's ability to accurately express emotional intent and nuanced affective attributes. This work proposes EPIG, a method that enhances emotional expressiveness at the prompt level prior to image generation. Grounded in psychologically informed emotion representations (valence-arousal) and leveraging structured, role-aware prompt enrichment, EPIG enriches emotion-related components of prompts without modifying or retraining the image generation backbone. The resulting emotion-aware prompts guide the generative process toward more emotionally coherent visual outputs, with particular effectiveness in controlling arousal. EPIG is lightweight, training-free, and well suited for resource-constrained and personalized image generation scenarios. Experimental results on a benchmark of 10 diverse prompts show that EPIG reduces mean arousal error compared to strong baselines, including naive insertion and LLM-based prompt expansion, with reductions of 14% and 12%, respectively. These improvements are statistically significant. EPIG also preserves valence alignment and semantic consistency, as measured by CLIPScore and supported by ablation studies. The effect is more pronounced on prompts containing explicit subjects such as humans, children, or animals, where the reduction reaches 17%, highlighting the subject-sensitive behavior of the proposed method.

翻译：文本到图像扩散模型在从自然语言提示合成高质量图像方面取得了令人瞩目的成果。然而，常用的提示策略仍相对通用，限制了模型准确表达情感意图和细微情感属性的能力。本文提出EPIG方法，该方法在图像生成前于提示层面增强情感表现力。EPIG基于心理学启发的情感表征（效价-唤醒度）并利用结构化、角色感知的提示增强机制，在不修改或重新训练图像生成主干模型的情况下，丰富提示中与情感相关的组件。由此产生的情感感知提示引导生成过程产生更具情感连贯性的视觉输出，尤其在控制唤醒度方面效果显著。EPIG是一种轻量级、无需训练的方法，非常适合资源受限和个性化图像生成场景。在包含10个多样化提示的基准测试上的实验结果表明，与包括简单插入和基于LLM的提示扩展在内的强基线方法相比，EPIG将平均唤醒度误差分别降低了14%和12%。这些改善具有统计显著性。EPIG还保持了效价对齐和语义一致性（通过CLIPScore测量并得到消融研究支持）。该方法在包含人类、儿童或动物等显式主题的提示上效果更为显著，误差降低高达17%，凸显了所提方法对主题敏感的特性。