EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Recent years have witnessed remarkable progress in image generation task, where users can create visually astonishing images with high-quality. However, existing text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments, facing limitations in effectively conveying emotions with fixed image contents. In this work, we introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space, providing a concrete interpretation of abstract emotions. Attribute loss and emotion confidence are further proposed to ensure the semantic diversity and emotion fidelity of the generated images. Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively, where we derive three custom metrics, i.e., emotion accuracy, semantic clarity and semantic diversity. In addition to generation, our method can help emotion understanding and inspire emotional art design.

翻译：近年来，图像生成任务取得了显著进展，用户能够创建视觉惊艳的高质量图像。然而，现有的文本到图像扩散模型擅长生成具体概念（如狗）的图像，但在处理抽象概念（如情绪）时面临挑战。已有研究尝试通过调整颜色和风格来修改图像情绪，但受限于固定图像内容而难以有效传达情绪。本文提出情绪图像内容生成（EICG）这一新任务，旨在根据情绪类别生成语义清晰且情绪忠实的图像。具体而言，我们构建了一个情绪空间，并设计映射网络将其与强大的对比语言-图像预训练（CLIP）空间对齐，从而为抽象情绪提供具体解释。进一步引入属性损失和情绪置信度，以确保生成图像的语义多样性和情绪保真度。我们的方法在定量和定性评估上均优于当前最先进的文本到图像方法，并提出了三个自定义评价指标：情绪准确率、语义清晰度和语义多样性。除生成任务外，该方法还可辅助情绪理解，并激发情绪艺术设计。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日