Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech

MoDELS · Analysis · 模型评估 · Signal Processing · Processing（编程语言） ·

2024 年 12 月 12 日

翻译：基于梅尔频率倒谱系数与小波变换的语音情感检测与心理健康评估对比分析

Idoko Agbo,Dr Hoda El-Sayed,M. D Kamruzzan Sarker

The intersection of technology and mental health has spurred innovative approaches to assessing emotional well-being, particularly through computational techniques applied to audio data analysis. This study explores the application of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models on wavelet extracted features and Mel-frequency Cepstral Coefficients (MFCCs) for emotion detection from spoken speech. Data augmentation techniques, feature extraction, normalization, and model training were conducted to evaluate the models' performance in classifying emotional states. Results indicate that the CNN model achieved a higher accuracy of 61% compared to the LSTM model's accuracy of 56%. Both models demonstrated better performance in predicting specific emotions such as surprise and anger, leveraging distinct audio features like pitch and speed variations. Recommendations include further exploration of advanced data augmentation techniques, combined feature extraction methods, and the integration of linguistic analysis with speech characteristics for improved accuracy in mental health diagnostics. Collaboration for standardized dataset collection and sharing is recommended to foster advancements in affective computing and mental health care interventions.

翻译：技术与心理健康的交叉领域催生了评估情绪健康的创新方法，特别是通过应用于音频数据分析的计算技术。本研究探讨了卷积神经网络（CNN）和长短期记忆（LSTM）模型在小波提取特征和梅尔频率倒谱系数（MFCCs）上，用于语音情感检测的应用。通过数据增强技术、特征提取、归一化和模型训练，评估了模型在分类情绪状态方面的性能。结果表明，CNN模型达到了61%的较高准确率，而LSTM模型的准确率为56%。两种模型在预测特定情绪（如惊讶和愤怒）方面均表现出更好的性能，利用了如音高和语速变化等不同的音频特征。建议包括进一步探索先进的数据增强技术、组合特征提取方法，以及将语言分析与语音特征相结合，以提高心理健康诊断的准确性。建议开展合作以建立标准化的数据集收集与共享机制，从而推动情感计算和心理健康护理干预措施的进步。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日