Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It also provides a method for automatically specifying test thresholds for fairness tests, based on the datasets used, and recommendations on how to select the remaining test thresholds. Nine different transformer based models, an xLSTM based model and a convolutional baseline model are tested for arousal, valence, dominance, and emotional categories. The test results highlight, that models with high correlation or recall might rely on shortcuts -- such as text sentiment --, and differ in terms of fairness.

翻译：语音情感识别（SER）的机器学习模型可针对不同任务进行训练，通常基于每个任务可用的少数数据集进行评估。任务可包括唤醒度、效价、支配度、情感类别或语音语调。这些模型主要依据相关性或召回率进行评估，其预测结果总会存在一定误差。这些误差体现在模型行为中，即使模型达到相同的召回率或相关性，其在不同维度上的表现也可能存在显著差异。本文提出一个测试框架，通过要求不同指标达到特定阈值以通过测试，从而研究语音情感识别模型的行为。测试指标可分为正确性、公平性与鲁棒性三类。该框架还提供了一种基于所用数据集自动设定公平性测试阈值的方法，并对如何选择其余测试阈值提出了建议。研究对九个基于Transformer的模型、一个基于xLSTM的模型以及一个卷积基线模型在唤醒度、效价、支配度和情感类别任务上进行了测试。测试结果表明，具有高相关性或召回率的模型可能依赖捷径（如文本情感），且在公平性方面存在差异。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/