Visual emotion analysis holds significant research value in both computer vision and psychology. However, existing methods for visual emotion analysis suffer from limited generalizability due to the ambiguity of emotion perception and the diversity of data scenarios. To tackle this issue, we introduce UniEmoX, a cross-modal semantic-guided large-scale pretraining framework. Inspired by psychological research emphasizing the inseparability of the emotional exploration process from the interaction between individuals and their environment, UniEmoX integrates scene-centric and person-centric low-level image spatial structural information, aiming to derive more nuanced and discriminative emotional representations. By exploiting the similarity between paired and unpaired image-text samples, UniEmoX distills rich semantic knowledge from the CLIP model to enhance emotional embedding representations more effectively. To the best of our knowledge, this is the first large-scale pretraining framework that integrates psychological theories with contemporary contrastive learning and masked image modeling techniques for emotion analysis across diverse scenarios. Additionally, we develop a visual emotional dataset titled Emo8. Emo8 samples cover a range of domains, including cartoon, natural, realistic, science fiction and advertising cover styles, covering nearly all common emotional scenes. Comprehensive experiments conducted on six benchmark datasets across two downstream tasks validate the effectiveness of UniEmoX. The source code is available at https://github.com/chincharles/u-emo.
翻译:视觉情感分析在计算机视觉与心理学领域均具有重要的研究价值。然而,由于情感感知的模糊性与数据场景的多样性,现有的视觉情感分析方法普遍存在泛化能力有限的问题。为解决此问题,我们提出了UniEmoX,一种跨模态语义引导的大规模预训练框架。受心理学研究启发,该研究强调情感探索过程与个体及其环境交互的不可分割性,UniEmoX整合了以场景为中心和以人物为中心的低层图像空间结构信息,旨在推导出更细致、更具区分度的情感表征。通过利用配对与非配对图文样本之间的相似性,UniEmoX从CLIP模型中蒸馏出丰富的语义知识,以更有效地增强情感嵌入表征。据我们所知,这是首个将心理学理论与当代对比学习及掩码图像建模技术相结合,用于跨多样场景情感分析的大规模预训练框架。此外,我们构建了一个名为Emo8的视觉情感数据集。Emo8样本涵盖卡通、自然、写实、科幻及广告封面等多种风格领域,覆盖了几乎所有常见的情感场景。在两个下游任务的六个基准数据集上进行的全面实验验证了UniEmoX的有效性。源代码公开于 https://github.com/chincharles/u-emo。