HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Emotion recognition in conversations is challenging due to the multi-modal nature of the emotion expression. We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition using a combination of recurrent and co-attention neural network models. The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, ii) text data represented using a bidirectional encoder representations from transformers (BERT) model. The audio and text representations are processed using a set of bi-directional recurrent neural network layers with self-attention that converts each utterance in a given conversation to a fixed dimensional embedding. In order to incorporate contextual knowledge and the information across the two modalities, the audio and text embeddings are combined using a co-attention layer that attempts to weigh the utterance level embeddings relevant to the task of emotion recognition. The neural network parameters in the audio layers, text layers as well as the multi-modal co-attention layers, are hierarchically trained for the emotion classification task. We perform experiments on three established datasets namely, IEMOCAP, MELD and CMU-MOSI, where we illustrate that the proposed model improves significantly over other benchmarks and helps achieve state-of-art results on all these datasets.

翻译：对话中的情感识别因情感表达的多模态特性而颇具挑战。我们提出一种层次交叉注意力模型（HCAM）方法，该方法结合循环神经网络与协同注意力神经网络模型，用于多模态情感识别。模型输入包含两种模态：i) 音频数据，通过可学习的wav2vec方法处理；ii) 文本数据，采用基于变换器的双向编码器表示（BERT）模型表示。音频与文本表示经由一组带有自注意力的双向循环神经网络层处理，将给定对话中的每个话语转换为固定维度的嵌入。为整合跨模态的背景知识与信息，音频与文本嵌入通过一个协同注意力层进行融合，该层旨在对与情感识别任务相关的话语级别嵌入进行加权。音频层、文本层以及多模态协同注意力层中的神经网络参数，均针对情感分类任务进行层次化训练。我们在三个既定数据集（即IEMOCAP、MELD及CMU-MOSI）上开展实验，结果表明所提模型相较于其他基准方法有显著提升，并在所有数据集上实现了最先进的结果。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日