Emotion Recognition in Conversation (ERC) has become a fundamental capability for large language models (LLMs) in human-centric interaction. Beyond accurate recognition, coherent emotional expression is also crucial, yet both are limited by the scarcity and static nature of high-quality annotated data. In this work, we propose SELF-EMO, a self-evolution framework grounded in the hypothesis that better emotion prediction leads to more consistent emotional responses. We introduce two auxiliary tasks, emotional understanding and emotional expression, and design a role-based self-play paradigm where the model acts as both an emotion recognizer and a dialogue responder. Through iterative interactions, the model generates diverse conversational trajectories, enabling scalable data generation. To ensure quality, we adopt a data flywheel mechanism that filters candidate predictions and responses using a smoothed IoU-based reward and feeds selected samples back for continuous self-improvement without external supervision. We further develop SELF-GRPO, a reinforcement learning algorithm that stabilizes optimization with multi-label alignment rewards and group-level consistency signals. Experiments on IEMOCAP, MELD, and EmoryNLP show that SELF-EMO achieves state-of-the-art performance, improving accuracy by +6.33% on Qwen3-4B and +8.54% on Qwen3-8B, demonstrating strong effectiveness and generalization.
翻译:对话情感识别已成为大语言模型在人机交互中的基础能力。除精确识别外,连贯的情感表达同样至关重要,然而两者均受限于高质量标注数据的稀缺性与静态特性。本文提出SELF-EMO框架,其核心假设为:更优的情感预测能产生更一致的情感回应。我们引入情感理解与情感表达两项辅助任务,并设计基于角色的自我博弈范式,使模型同时扮演情感识别器与对话响应者。通过迭代交互,模型生成多样的对话轨迹,从而实现可扩展的数据生成。为确保质量,我们采用数据飞轮机制,利用基于平滑交并比(IoU)的奖励对候选预测与响应进行过滤,并将筛选样本反馈至模型实现无需外部监督的持续自我改进。此外,我们提出SELF-GRPO强化学习算法,通过多标签对齐奖励与群体一致性信号稳定优化过程。在IEMOCAP、MELD与EmoryNLP数据集上的实验表明,SELF-EMO取得最优性能:在Qwen3-4B上提升准确率+6.33%,在Qwen3-8B上提升+8.54%,展现出强效性与泛化能力。