This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.
翻译:本文提出了一种用于情感智能的统一口语语言模型,其通过一种新颖的数据构建策略——情感归因思维注入(IEAT)得到增强。IEAT将用户的情感状态及其潜在成因融入模型的内部推理过程,使情感感知推理得以内化,而非作为显式监督进行处理。该模型采用两阶段渐进策略进行训练。第一阶段通过自蒸馏进行语音-文本对齐与情感属性建模,而第二阶段则进行端到端的跨模态联合优化,以确保文本与口语情感表达之间的一致性。在类人口语对话系统挑战赛(HumDial)情感智能基准上的实验表明,所提出的方法在基于大语言模型和人类评估的情感轨迹建模、情感推理及共情回应生成任务中均取得了顶尖性能。