Affect recognition, encompassing emotions, moods, and feelings, plays a pivotal role in human communication. In the realm of conversational artificial intelligence (AI), the ability to discern and respond to human affective cues is a critical factor for creating engaging and empathetic interactions. This study delves into the capacity of large language models (LLMs) to recognise human affect in conversations, with a focus on both open-domain chit-chat dialogues and task-oriented dialogues. Leveraging three diverse datasets, namely IEMOCAP, EmoWOZ, and DAIC-WOZ, covering a spectrum of dialogues from casual conversations to clinical interviews, we evaluated and compared LLMs' performance in affect recognition. Our investigation explores the zero-shot and few-shot capabilities of LLMs through in-context learning (ICL) as well as their model capacities through task-specific fine-tuning. Additionally, this study takes into account the potential impact of automatic speech recognition (ASR) errors on LLM predictions. With this work, we aim to shed light on the extent to which LLMs can replicate human-like affect recognition capabilities in conversations.
翻译:情感识别涵盖情绪、心境和感受,在人类交流中起着关键作用。在对话式人工智能领域,感知并回应当人类情感线索的能力是创建引人入胜且富有同理心互动的关键因素。本研究深入探讨了大语言模型在对话中识别人类情感的能力,重点关注开放领域闲聊对话和任务导向对话。借助三个多样化数据集——IEMOCAP、EmoWOZ和DAIC-WOZ(涵盖从日常对话到临床访谈的多种对话类型),我们评估并比较了LLMs在情感识别中的表现。通过上下文学习探究了LLMs的零样本和少样本能力,并通过任务特定微调研究其模型容量。此外,本研究还考虑了自动语音识别错误对LLM预测的潜在影响。通过这项工作,我们旨在揭示LLMs在对话中能够多大程度上复现类人的情感识别能力。