Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the use of pre-trained model representations for estimating dimensional emotions, such as activation, valence, and dominance, from speech. We observed that while valence may rely heavily on lexical representations, activation and dominance rely mostly on acoustic information. In this work, we used multi-modal fusion representations from pre-trained models to generate state-of-the-art speech emotion estimation, and we showed a 100% and 30% relative improvement in concordance correlation coefficient (CCC) on valence estimation compared to standard acoustic and lexical baselines. Finally, we investigated the robustness of pre-trained model representations against noise and reverberation degradation and noticed that lexical and acoustic representations are impacted differently. We discovered that lexical representations are more robust to distortions compared to acoustic representations, and demonstrated that knowledge distillation from a multi-modal model helps to improve the noise-robustness of acoustic-based models.

翻译：预训练模型表示在语音识别、自然语言处理及其他应用中展现了最先进的性能。诸如双向编码器表示变换器（BERT）和隐单元BERT（HuBERT）等语音模型，能够生成词汇和声学表示以提升语音识别应用。我们研究了利用预训练模型表示从语音中估计维度情感（如激活度、效价和支配度）的方法。观察到，效价可能主要依赖词汇表示，而激活度和支配度则主要依赖声学信息。在本工作中，我们采用预训练模型的多模态融合表示生成最先进的语音情感估计，并在效价估计上与标准声学和词汇基线相比，实现了100%和30%的一致性相关系数（CCC）相对提升。最后，我们探究了预训练模型表示对噪声和混响退化的鲁棒性，并注意到词汇和声学表示受到的影响不同。我们发现词汇表示相比声学表示对失真更具鲁棒性，并证明从多模态模型进行知识蒸馏有助于提升基于声学模型的噪声鲁棒性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日