Open-Vocabulary Multimodal Emotion Recognition (OV-MER) aims to predict emotions without being constrained by label spaces, enabling fine-grained emotion understanding. Unlike traditional discriminative methods, OV-MER leverages generative models to capture the full spectrum of emotions and employs emotion wheels (EWs) for metric calculation. Previous approaches (e.g., AffectGPT) primarily rely on token-level loss during training. However, this objective is misaligned with the metrics used in OV-MER, while these metrics cannot be optimized via gradient backpropagation. To address this limitation, we propose AffectGPT-R1, a reinforcement learning framework that treats EW-based metrics as a reward function and applies policy optimization to maximize this reward. Additionally, we introduce an explicit reasoning process and examine its necessity in OV-MER. To further guide model behavior, we incorporate auxiliary rewards that regularize both emotion reasoning and emotion prediction. We also apply length penalties to mitigate reward hacking. Experimental results demonstrate that AffectGPT-R1 yields significant performance improvements on OV-MER. Moreover, our approach enhances generalized emotion understanding, achieving state-of-the-art results on MER-UniBench. Our code is provided in the supplementary material and will be released to facilitate future research.
翻译:开放词汇多模态情感识别(OV-MER)旨在不受标签空间限制的情况下预测情感,从而实现细粒度的情感理解。与传统判别式方法不同,OV-MER利用生成模型捕捉完整的情感谱,并采用情感轮(EWs)进行度量计算。先前的方法(例如AffectGPT)主要依赖于训练期间的词元级损失。然而,该目标与OV-MER中使用的度量标准不一致,而这些度量标准无法通过梯度反向传播进行优化。为解决这一局限,我们提出了AffectGPT-R1,这是一个强化学习框架,将基于EW的度量视为奖励函数,并应用策略优化来最大化该奖励。此外,我们引入了显式推理过程,并探讨了其在OV-MER中的必要性。为进一步引导模型行为,我们引入了辅助奖励,以规范情感推理和情感预测。我们还应用长度惩罚来缓解奖励黑客问题。实验结果表明,AffectGPT-R1在OV-MER上带来了显著的性能提升。此外,我们的方法增强了对情感的泛化理解能力,在MER-UniBench上取得了最先进的结果。我们的代码已在补充材料中提供,并将公开发布以促进未来研究。