Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating

Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the subjective emotional contents, and more importantly lack general-purpose foundation models for affective visual customization. With this in mind, this paper proposes an LLM-centric Affective Visual Customization (L-AVC) task, which focuses on generating images within modifying their subjective emotions via Multimodal LLM. Further, this paper contends that how to make the model efficiently align emotion conversion in semantics (named inter-emotion semantic conversion) and how to precisely retain emotion-agnostic contents (named exter-emotion semantic retaining) are rather important and challenging in this L-AVC task. To this end, this paper proposes an Efficient and Precise Emotion Manipulating approach for editing subjective emotions in images. Specifically, an Efficient Inter-emotion Converting (EIC) module is tailored to make the LLM efficiently align emotion conversion in semantics before and after editing, followed by a Precise Exter-emotion Retaining (PER) module to precisely retain the emotion-agnostic contents. Comprehensive experimental evaluations on our constructed L-AVC dataset demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines. This justifies the importance of emotion information for L-AVC and the effectiveness of EPEM in efficiently and precisely manipulating such information.

翻译：先前关于视觉定制的研究主要依赖于各种控制信号（如语言、布局和边缘检测）与编辑图像之间的客观对齐，这很大程度上忽略了主观情感内容，更重要的是缺乏用于情感视觉定制的通用基础模型。鉴于此，本文提出了一项以LLM为中心的情感视觉定制（L-AVC）任务，其重点在于通过多模态LLM在修改图像主观情感的同时生成图像。进一步地，本文认为，如何使模型在语义层面高效地对齐情感转换（称为内部情感语义转换），以及如何精确保留与情感无关的内容（称为外部情感语义保留），对于L-AVC任务至关重要且具有挑战性。为此，本文提出了一种用于编辑图像中主观情感的高效精准情感操控方法。具体而言，我们设计了一个高效的内部情感转换（EIC）模块，使LLM能够在编辑前后高效地对齐语义层面的情感转换；随后是一个精准的外部情感保留（PER）模块，用于精确保留与情感无关的内容。在我们构建的L-AVC数据集上进行全面的实验评估表明，所提出的EPEM方法在L-AVC任务上相较于多个最先进的基线模型具有显著优势。这证明了情感信息对于L-AVC的重要性，以及EPEM在高效精准操控此类信息方面的有效性。