Multimodal Emotion Recognition (MER) is an important research topic. This paper advocates for a transformative paradigm in MER. The rationale behind our work is that current approaches often rely on a limited set of basic emotion labels, which do not adequately represent the rich spectrum of human emotions. These traditional and overly simplistic emotion categories fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality. Therefore, we propose a new MER paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions. To support this transition, we provide a comprehensive solution that includes a newly constructed database based on LLM and human collaborative annotations, along with corresponding metrics and a series of benchmarks. We hope this work advances emotion recognition from basic emotions to more nuanced emotions, contributing to the development of emotional AI.
翻译:多模态情感识别(MER)是一个重要的研究课题。本文倡导在MER领域实现一种变革性范式。我们工作的核心理念在于,当前方法通常依赖于有限的基本情感标签集,这些标签无法充分表征人类情感的丰富谱系。这些传统且过于简化的情感类别难以捕捉人类情感体验固有的复杂性与微妙性,导致模型的泛化能力和实用性受限。为此,我们提出一种名为开放词汇多模态情感识别(OV-MER)的新范式,该范式涵盖更广泛的情感标签以反映人类情感的丰富性。此范式放宽了标签空间的限制,允许预测任意数量和类别的情感。为支持这一转变,我们提供了一套完整的解决方案,包括基于大语言模型与人工协同标注构建的新数据库,以及相应的度量标准和一系列基准测试。我们希望这项工作能推动情感识别从基础情感向更细腻情感的演进,为情感人工智能的发展作出贡献。