Recent years have witnessed remarkable progress in image generation task, where users can create visually astonishing images with high-quality. However, existing text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments, facing limitations in effectively conveying emotions with fixed image contents. In this work, we introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space, providing a concrete interpretation of abstract emotions. Attribute loss and emotion confidence are further proposed to ensure the semantic diversity and emotion fidelity of the generated images. Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively, where we derive three custom metrics, i.e., emotion accuracy, semantic clarity and semantic diversity. In addition to generation, our method can help emotion understanding and inspire emotional art design.
翻译:近年来,图像生成任务取得了显著进展,用户能够创建视觉惊艳的高质量图像。然而,现有的文本到图像扩散模型擅长生成具体概念(如狗)的图像,但在处理抽象概念(如情绪)时面临挑战。已有研究尝试通过调整颜色和风格来修改图像情绪,但受限于固定图像内容而难以有效传达情绪。本文提出情绪图像内容生成(EICG)这一新任务,旨在根据情绪类别生成语义清晰且情绪忠实的图像。具体而言,我们构建了一个情绪空间,并设计映射网络将其与强大的对比语言-图像预训练(CLIP)空间对齐,从而为抽象情绪提供具体解释。进一步引入属性损失和情绪置信度,以确保生成图像的语义多样性和情绪保真度。我们的方法在定量和定性评估上均优于当前最先进的文本到图像方法,并提出了三个自定义评价指标:情绪准确率、语义清晰度和语义多样性。除生成任务外,该方法还可辅助情绪理解,并激发情绪艺术设计。