Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, language diversity, and accessibility. In this work, we propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, i.e., English and Mandarin. Furthermore, it is completely open-source for free access. To our knowledge, MC-EIU is the first comprehensive and rich emotion and intent joint understanding dataset for multimodal conversation. Together with the release of the dataset, we also develop an Emotion and Intent Interaction (EI$^2$) network as a reference system by modeling the deep correlation between emotion and intent in the multimodal conversation. With comparative experiments and ablation studies, we demonstrate the effectiveness of the proposed EI$^2$ method on the MC-EIU dataset. The dataset and codes will be made available at: https://github.com/MC-EIU/MC-EIU.
翻译:多模态对话中的情绪与意图联合理解(MC-EIU)旨在解码多模态对话历史中呈现的语义信息,同时推断当前话语的情绪和意图。MC-EIU是许多人机交互界面的使能技术。然而,现有数据集在标注、模态、语言多样性和可访问性方面均存在不足。本研究提出了一个MC-EIU数据集,其特点包括7种情绪类别、9种意图类别、3种模态(即文本、声学和视觉内容)以及两种语言(英语和普通话)。此外,该数据集完全开源并可免费获取。据我们所知,MC-EIU是首个面向多模态对话的全面且丰富的情绪与意图联合理解数据集。在发布数据集的同时,我们还开发了一个情绪与意图交互(EI$^2$)网络作为参考系统,通过建模多模态对话中情绪与意图之间的深层关联。通过对比实验和消融研究,我们验证了所提出的EI$^2$方法在MC-EIU数据集上的有效性。数据集和代码将在以下地址公开:https://github.com/MC-EIU/MC-EIU。