Sentiment and emotion understanding are essential to applications such as human-computer interaction and depression detection. While Multimodal Large Language Models (MLLMs) demonstrate robust general capabilities, they face considerable challenges in the field of affective computing, particularly in detecting subtle facial expressions and handling complex emotion-related tasks, such as emotion reason inference and understanding emotions in long-context scenarios. Furthermore, there is a lack of a unified MLLM that can effectively handle both sentiment and emotion-related tasks. To address these challenges, we explore multi-task training strategies for MLLMs in affective computing and introduce Emotion Universe (EmoVerse), an MLLM designed to handle a broad spectrum of sentiment and emotion-related tasks. In addition, EmoVerse is capable of deeply analyzing the underlying causes of emotional states. We also introduce the Affective Multitask (AMT) Dataset, which supports multimodal sentiment analysis, multimodal emotion recognition, facial expression recognition, emotion reason inference, and emotion cause-pair extraction tasks. Extensive experiments demonstrate that EmoVerse outperforms existing methods, achieving state-of-the-art results in sentiment and emotion-related tasks. The code is available at https://github.com/liaolea/EmoVerse.
翻译:情感与情绪理解对于人机交互和抑郁检测等应用至关重要。尽管多模态大语言模型展现出强大的通用能力,但在情感计算领域仍面临显著挑战,尤其是在检测细微面部表情以及处理复杂情绪相关任务(如情绪推理推断和理解长上下文场景中的情绪)方面。此外,目前缺乏能够同时有效处理情感与情绪相关任务的统一多模态大语言模型。为应对这些挑战,我们探索了多模态大语言模型在情感计算中的多任务训练策略,并提出了Emotion Universe(EmoVerse)——一个旨在处理广泛情感与情绪相关任务的多模态大语言模型。该模型还能深入分析情绪状态的内在成因。我们同时构建了Affective Multitask(AMT)数据集,支持多模态情感分析、多模态情绪识别、面部表情识别、情绪推理推断及情绪因果对抽取任务。大量实验表明,EmoVerse在情感与情绪相关任务中优于现有方法,取得了最先进的性能。代码已开源:https://github.com/liaolea/EmoVerse。