Testing Speech Emotion Recognition Machine Learning Models

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated on the basis of a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It further provides a method to specify test thresholds for fairness tests automatically, based on the used datasets, and recommendations how to select the remaining test thresholds. Seven different transformer based models, and a baseline model are tested for arousal, valence, dominance, and emotional categories. The test results highlight, that models with high correlation or recall might rely on shortcuts - such as text sentiment - to achieve this, and differ in terms of fairness.

翻译：用于语音情感识别（SER）的机器学习模型可以针对不同的任务进行训练，并且通常基于每个任务可用的少数几个数据集进行评估。任务可能包括唤醒度、效价、支配度、情感类别或语音语调。这些模型主要根据相关性或召回率进行评估，并且其预测总是存在一些错误。这些错误体现在模型行为中，即使模型达到了相同的召回率或相关性，其在不同维度上的行为也可能存在很大差异。本文引入了一个测试框架来研究语音情感识别模型的行为，要求不同的指标达到特定阈值才能通过测试。测试指标可以根据正确性、公平性和鲁棒性进行分组。该框架进一步提供了一种方法，能够基于所使用的数据集自动指定公平性测试的阈值，并就如何选择其余测试阈值给出了建议。本文针对唤醒度、效价、支配度和情感类别任务，测试了七个不同的基于Transformer的模型以及一个基线模型。测试结果突出表明，具有高相关性或高召回率的模型可能依赖捷径（例如文本情感）来实现这一点，并且在公平性方面存在差异。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日