With generative AI advancing, empathy in human-AI interaction is essential. While prior work focuses on emotional reflection, emotional exploration, key to deeper engagement, remains overlooked. Existing LLMs rely on text which captures limited emotion nuances. To address this, we propose RE-LLM, a speech-LLM integrating dimensional emotion embeddings and auxiliary learning. Experiments show statistically significant gains in empathy metrics across three datasets. RE-LLM relatively improves the Emotional Reaction score by 14.79% and 6.76% compared to text-only and speech-LLM baselines on ESD. Notably, it raises the Exploration score by 35.42% and 3.91% on IEMOCAP, 139.28% and 9.83% on ESD, and 60.95% and 22.64% on MSP-PODCAST. It also boosts unweighted accuracy by 5.4% on IEMOCAP, 2.3% on ESD, and 6.9% on MSP-PODCAST in speech emotion recognition. These results highlight the enriched emotional understanding and improved empathetic response generation of RE-LLM.
翻译:随着生成式人工智能的发展,人机交互中的共情能力至关重要。尽管先前的研究侧重于情感反射,但对实现更深层次互动至关重要的情感探索却一直被忽视。现有的LLM依赖于文本,而文本捕捉的情感细微差别有限。为解决这一问题,我们提出了RE-LLM,一种融合了维度情感嵌入和辅助学习的语音-LLM。实验表明,在三个数据集上,共情指标均取得了统计学上的显著提升。在ESD数据集上,与纯文本和语音-LLM基线相比,RE-LLM将情感反应分数相对提升了14.79%和6.76%。值得注意的是,它在IEMOCAP数据集上将探索分数提升了35.42%和3.91%,在ESD上提升了139.28%和9.83%,在MSP-PODCAST上提升了60.95%和22.64%。同时,在语音情感识别任务中,它在IEMOCAP上的未加权准确率提升了5.4%,在ESD上提升了2.3%,在MSP-PODCAST上提升了6.9%。这些结果突显了RE-LLM在丰富情感理解和改进共情响应生成方面的优势。