Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition.
翻译:语音情感识别(SER)因其在心理健康、教育及人机交互等多个应用领域的重要性而受到广泛关注。然而,高维特征集中可能包含不相关及冗余信息,制约了SER系统的准确率。为应对这一挑战,本研究提出一种面向SER的迭代式特征增强方法,该方法强调特征相关性与可解释性,以提升机器学习模型性能。我们的方法通过精细的特征选择与分析构建高效的SER系统。在借助模型可解释性解决核心问题的过程中,我们采用基于Shapley值的特征评估循环对特征集进行迭代优化。该过程在模型性能与透明度之间取得平衡,从而实现对模型预测行为的全面理解。所提方法具有多重优势:既能识别并剔除不相关及冗余特征以构建更高效的模型,又能增强可解释性,有助于理解模型预测机制并识别情感判定的关键特征。该方法在Toronto情感语音集(TESS)、柏林情感语音数据库(EMO-DB)、瑞尔森视听情感语音歌曲数据库(RAVDESS)及萨里视听情感表达数据库(SAVEE)等SER基准数据集上验证了有效性,其性能优于现有先进方法。据我们所知,这是首次将模型可解释性整合到SER框架中的研究工作。本文源代码已通过此链接公开:https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition。